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ena TIRES 


ASYMPTOTIC APPROXIMATIONS TO DISTRIBUTIONS' 


By Davin L. WALLACE 
University of Chicago 


1. Introduction. The study of approximations to distributions formed a major 
part of statistical developments during the early part of this century and included 
important work by Charlier, Edgeworth, Pearson and numerous others. The 
principal problem was the approximation to empirical distributions by theo- 
retical functions and the methods proposed consisted chiefly either of choosing 
an approximating function from some class of functions, such as the Pearson 
type distributions or the Gram-Charlier functions, or of choosing a transforma- 
tion of the variable which would reduce the distribution to approximate nor- 
mality. 

With the increasing importance of statistical inference, interest in the original 
problem of approximating to empirical distributions virtually disappeared. But 
interest in approximations has continued because of the increasing number and 
complexity of theoretical distributions and the need for usable approximations 
to them. In addition to the direct use for approximate evaluation of the dis- 
tribution functions or the quantiles of complicated distributions, approximations 
have been valuable in such problems as the Behrens-Fisher problem and in the 
investigation of robustness of standard tests of hypotheses. 

There are several general approaches to distribution approximations. The one 
to which I restrict attention is that of finding asymptotic expansions—in which 
the errors of approximation approach zero as some parameter, typically a sample 
size, approaches infinity. Essentially, the method consists of finding improve- 
ments to the large sample approximations used throughout statistics. A variety 
of expansions have been developed for many problems and the approximations 
are amenable to theoretical as well as empirical study. - 

In a simple and common form, each function F’,,(x) in a sequence of functions is 
approximated by any partial sum of a series 


> A(x) 


i=0 (y/n)' 
and the errors satisfy the condition 


: 4 i 
ri) EAD | CO 
‘<0 (V/n) (Vn) 


that is, the errors, using any partial sum, are of the same order of magnitude as 
the first neglected term. I call an asymptotic expansion valid to r terms if the 


Received January 17, 1958; revised April 28, 1958. 

1 An Address presented on September 13, 1957 at the Atlantic City meeting of the Insti- 
tute of Mathematical Statistics by invitation of the IMS Committee on Special Invited 
Papers. This paper was prepared with the partial support of the Office of Naval Research. 


635 





636 DAVID L. WALLACE 


first r + 1 partial sums have this property, and valid uniformly in x if the 
bounds C,(a) do not depend on x. (The theory of asymptotic expansions is given, 
for example, by Erdelyi [28]}.) 

There are a few points on the use of asymptotic expansions which have caused 
some confusion. Frequently, an expansion can be extended validly to infinitely 
many terms. For any fixed n, the infinite series may be convergent, but in 
statistical applications usually is not. The asymptotic property is a property of 
finite partial sums, and though the addition of the next term will for sufficiently 
large n improve the approximation, for any prescribed n it may not do so. 
Typically the bounds C,(a) increase rapidly with r, and for small n only the first 
few terms are improvements. 

Ideally, sharp values of C,(2) should be known. (This is rare in statistical 
applications but common in applications to special functions like the gamma or 
Bessel functions.) Then successive terms could be added until the error bound 
reaches its minimum, giving the best guaranteed approximation, or an earlier 
sum used if the error is small enough. But asymptotic expansions, except where 
convergent, have the inherent limitation that there is a minimum error which 
limits the accuracy achievable. 

For the aymptotic expansions used in statistics, the state of knowledge is much 
less satisfactory. Usually, only the order of magnitude of the errors is known, 
and only rarely are explicit bounds known—-and these are far from sharp. Indeed, 
many expansions in common use have been obtained by formal operations with 
terms collected according to their order of magnitude, but without proof that the 
errors are of correct order. I call these formal asymptotic expansions and will 
try to indicate where they can be proved valid by careful but simple analysis. 

The approximations discussed in this paper divide into two groups, the first 
consisting of approximations based ultimately on the central limit theorem and 
which use only the moments of the distribution to be approximated, and the 
second including various approximations using detailed information about the 
distribution. 


2. The central limit theorem. The center of a large part of the asymptotie 
theory is the central limit theorem for sums of independent random variables. 
Let {X,,} be a sequence of independent random variables. Denote by /’,, the dis- 
tribution function of the standardized sum 


(2.1) a di. 1(X,; — E(X;,)) 
V >of Var (X;) 


and by # the unit normal distribution function. The central limit theorem then 
states that lim,.. F(x) = &(x) for every fixed x, provided only that the means 
and variances are finite. If the {X;} are not identically distributed, an additional 


condition guaranteeing that the distributions are not too disbalanced is necessary 
(Lindeberg [51]). 


The best possible general results on the order of magnitude of the errors in 
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the central limit theorem were obtained during the 1940’s by Berry [7], Esseen 
[29], [30], and Bergstrém [4], [5], [6]. Their results are of considerable interest and 
their methods are extremely important in much of asymptotic theory. The 
result for the sum of identically distributed random variables is that 


sup | F,(x) — ®(z)| S - ~ : 
z Vna 
in which 8, is the third absolute moment and o° the variance of the component 
random variables. Several values for the constant C have been published, but 
only Berry’s calculations have been published. Hsu [45] pointed out an error in 
Berry’s calculation. This error can be corrected without affecting the result, but 
there is another more serious error. I have followed through the calculation and 
have found that 2.05 is a satisfactory replacement for the value 1.88 given by 
Berry. A more careful calculation would reduce this slightly. None of the other 
bounds suggested is as low as 2.05. Recent work of Esseen [31] has shown that as 
n approaches infinity, the minimum correct value of C approaches 
\ 0 + 3 l or ae 
6 V 2n 
This value is achieved as n approaches infinity for a certain binomial distribution 

The bound holds also for sums of nonidentically distributed random variables, 
though the second and third moments enter in more complicated ways. Although 
the corrected Berry constant is the lowest known, the results of Esseen and 
Bergstrom are generally stronger because of the way that the second and third 
moments enter the bound. 

All of the methods proceed by choosing as a kernel a distribution whose density 
function has a sharp maximum at the origin. A bound on the maximum difference 
of any two functions F(2) — G(x) can be obtained from any bound on the con- 
volution of this difference with the kernel distribution. The most common method 
of bounding the convolution has been to pass by Parseval’s theorem to the 
characteristic functions and bound the resultant integral. 

Much earlier, Lyapounov ({52], [53]) obtained a bound of order log n/+/ 7 
for the central limit theorem error by using a normal distribution with variance 
of order 1/n for a kernel. Berry and Esseen were able to get the best result by 
choosing kernel distributions whose characteristic functions vanished outside 
a finite interval. The bounding then reduces to showing 


eT f(t) — att) (7) 
it Ot = 
| T it 7 


where 1/7 is the order of magnitude desired for the final result and where f and g 


are the characteristic functions of F and G respectively. For the central limit 

theorem, /’ is F, , G is the normal distribution @ and T is of order +/ n. 
Bergstrém used the same choice as Lyapounov of a normal density for kernel, 

but he worked directly with the convolution integral. His method has proved 
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valuable in extensions to the multivariate central limit theorems and he has 
proved ({5], [6]) that the error there is again of order 1/n*. The characteristic 
function techniques have not here been used successfully. 

While the central limit theorem is very useful theoretically and often in prac- 
tice, it is not always satisfactory. For small or moderate n, the errors of the 
normal approximation may be too large. Indeed, Berry’s bound on the error is 
usually intolerable except for very large samples. Error bounds for special classes 
of distributions—-chiefly the binomial and Poisson distributions—have been 
found by Uspensky |70] and others ({14], [83], [54], [55)). 


3. Edgeworth series for sums. To obtain improvements and to prepare for 
later expansions, it will be convenient to develop a class of formal expansions 
sometimes known as the Charlier differential series [11]. In this formal develop- 
ment, the parameter n plays no role. The expansion is based on a distribution 
WY which need not be a normal distribution. Let y be its characteristic function 
and }{y,! its cumulants. Let F be the distribution to be approximated, f its char- 
acteristic function and {x,} its cumulants. By the definition of the cumulants, 
the characteristic functions satisfy the formal identity 


. (it 
(3.1) JW = exp (E: (ke — Yr) mr" vo. 


r 


If now, V and all its derivatives vanish at the extremes of the range of x and 
exist for all z in that range, then by integration by parts, (7t)"y(t) is the char- 
acteristic function of (—1)'¥"?(x). Introducing the differential operator D to 


represent differentiation with respect to 2, the formal identity corresponds term- 
wise in any formal expansion to the formal identity 


(3.2) F(x) = exp (Xr — ¥,) ee ) W(x). 
r} 


One can formally and apparently construct a distribution with prescribed cumu- 
lants by choosing ¥ and formally expanding. 

The most important developing function ¥(z) is a normal distribution and with 
that choice, the formal expansion had been given earlier by Chebyshev [13], 
Edgeworth [27] and Charlier [10]. 

Chebyshev and Charlier proceeded by expanding and collecting terms accord- 
ing to the order of the derivatives. The resulting expansion is most commonly 
known as the Gram-Charlier A series and is identical with the formal expansion 
of F — W in Hermite orthogonal functions. It is a least squares expansion in 
derivatives of the normal integral Y with respect to a weight function which is 
the reciprocal of the normal density ¥’. In this form, the expansion was de- 
veloped and studied earlier by Chebyshev [12], Gram [41] and others. 

The A-series converges for functions ’ whose tails approach zero faster than 
v” (see Szégo [63] or Cramér [19]). Convergence obtains for all distributions on 
finite intervals but few others of any interest. The developing normal distribution 
is usually chosen to have the same mean and variance as the given distribution F. 
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This choice has no effect on convergence, though it clearly has a tremendous 
effect on the quality of approximation by the first few terms. Altogether, the 
convergence properties are of little value and the importance of the Gram- 
Charlier series arises from its properties as an inferior form of an asymptotic 
expansion. 

The preferable development was done by Edgeworth as an improvement to 
the central limit theorem. Let the distribution to be approximated again be the 
distribution F,, of the standardized sum Y,, (eqn. 2.1) of independent random 
variables. Take the component random variables identically distributed with 
mean wu, Variance o, and higher cumulants {o’A, ;7 2 3}. Take the developing 
function W to be the unit normal distribution function &. Then the cumulant 
differences in the formal identity (3.1) are 


The Edgeworth series is obtained by collecting terms in the formal expansion 
according to powers of n, thus yielding a formal asymptotic expansion of the 
characteristic function of the form 


Jatt (: T > 
1 nm 


with P, a polynomial of degree 3r with coefficients depending on the cumulants 
of orders 5 through r + 2. If powers of ® are interpreted as derivatives, the 


corresponding distribution function expansion is 


F(x) = (x) + t= r) 


l Mt 


It is important to note that every term beyond the normal approximation can 
be expressed as the product of the normal density and a polynomial in «. The 
first few terms of the expansion are: 


On nL 24 42 
In 1928, Cramér [20] proved the series valid uniformly in x, but gave no explicit 
bounds on errors. Apart from requiring that one more cumulant exist than used 
in any partial sum, the proof assumes that the characteristic function h of the 
component random variables satisfies the condition 


(3.3) lim sup Ah(t)) < 1. 


{ +” 


This is satisfied if the component distribution has an absolutely continuous part. 


It is not satisfied for discrete distributions and the result then is generally not 
true 
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The elementary proofs given later by Esseen [30] and Hsu [45] use the method 
developed for the central limit theorem bound and amount to showing that 


i . ( & 
| f(t) = Gn) | yy o(1) 
JT |e) 1 


with 7 = e(n’)* and with g,4(¢) the expansion of the characteristic function 
through terms of order (1/n')* ‘ and using cumulants through order k + 1. 
Using a Maclaurin’s expansion of the characteristic function, the integral up to 
n’ is easily bounded by (¢28;42)/7' with the unknown distribution entering only 
through the absolute moment or order */ + 2. An efficient determination of 
c. would be extremely difficult. 

Using the Cramér condition (3.3) on the characteristic function, the integral 
from n' to T is easily bounded by c;/7. But by this evaluation, the resulting 
bound ¢; depends on the unknown distribution through its characteristic function 
and this even more seriously prevents the determination of any numerically 
useful bounds. 

Cramér [21] also proved the validity of the asymptotic expansion for sums of 
non-identically distributed random variables. The conditions are somewhat 
more restrictive. Cramér [20] showed that the termwise differentiated Edgeworth 
series is a valid expansion for the density function, provided the component 
random variables have a density function of bounded variation. Gnedenko and 
Kolmogorov [40] weaken this condition. They also present most of the work of 
Cramér and Esseen discussed here. 

Esseen [30] studied the expansion problem when the Cramér condition (3.3) 
on the characteristic function is not satisfied. The error in using the first ap- 
proximation 


wo eo 

bY n 
is of smaller order than 1/n’ provided only that the third moment is finite and 
that the distribution is not a lattice distribution. If the distribution of the com- 
ponent random variables is lattice, i.e., takes all probability on a set of equally 
spaced points, a different expansion is available. The Edgeworth density function 
expansion is, except for a constant multiple, a valid expansion for the jumps 
at each possible point. The usual Edgeworth expansion for the distribution 
function can be modified by the addition of terms (discontinuous) so that the 
resultant expansion is a valid expansion, uniformly for all x. The corrections, 
when evaluated at the points half-way between possible values of the stand- 
ardized sum, have no effect of order 1/n', but do for all higher orders. Thus, for 
example, the usual Edgeworth series when applied to a binomial or Poisson dis- 
tribution and evaluated only at half-integers is correct through order 1/n* but 
needs a correction of order 1/n. 

Since the Gram-Charlier A series is only a rearrangement of the Edgeworth 
series, its asymptotic properties follow directly. Of course, many higher terms 
must be used before all terms of the desired order are included. What makes the 
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Gram-Charlier arrangement so bad in practice is that these extra terms involve 
cumulants of much higher order. 

Multivariate Edgeworth series can be developed in complete analogy with 
the univariate expansions. Other than bounds for the normal approximation 
error, little theoretical work has been done with the multivariate expansions. 
Specifically, the multivariate Edgeworth series for sums of independent random 
variables has not been shown to be a valid asymptotic expansion. This would 
seem the most serious gap in theoretical knowledge of asymptotic approximations 


4. General Edgeworth and Cornish-Fisher series. Many sample functions 
have distributions asymptotically normal for increasing sample size, but not all 
admit asymptotic expansions beyond the normal distribution term. Expansions 
can be constructed for functions, such as most functions of sample moments, 
behaving asymptotically as sums of independent random variables. To illustrate 
with the simplest example, let H(X) be an arbitrary function, not depending 
on n, of the sample mean in a sample of size n from a population with cumulants 
{u, o , oA,}. The distribution of 


o/n(H(X) — H(y)) 
oH'(p) 


is asymptotically unit normal, provided that H’(u) # 0 and H is smooth enough 
at yw. (The assumption H'(x) # 0 and its equivalent for functions of several 
moments rule out many interesting functions for which no general theory of 
asymptotic expansions is known.) Assume H’(u) > 0. Then the distribution 
function A, of W, is given by 


K,(2) P(W, Ss =z) 
( [= / z 
pivalk — ») . ¥" (! [HW . 5, © “) 


Ss - =. + O(n ”) 
o @ 


(4.1) W, = 


in which J denotes the uniquely defined function inverse to H near yu and all 
other solutions of the inequality are easily shown to be of higher order than 
any power of 1/n. If the population satisfies the Cramér condition (3.3), the 
standardized mean has a valid Edgeworth expansion so that 


k-1 pn /(_ a/,,) J 
K(x) = &u) + >! eo + O(n”) 
1 - 
in which 
Vn ¢ | 1) += | - 1) 
ee ae 


“= 
o 
. . . () ‘ . rh . . ; 
Further, each derivative 6 (u) can be expanded in a Taylor series in 1/n' about 
n = ©, evaluating the derivatives of J at H(u) from the derivatives of H at u. 
If H is smooth enough at yu, and with some natural rearrangement of terms, a 





——* 
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valid asymptotic expansion of the same general form as the Edgeworth series 
for sums is obtained. I call these series also Edgeworth series. 

The construction would extend directly to the multivariate expansion of r 
functions of r sample moments if only the multivariate Edgeworth expansion 
for sums were valid. The expansion for the distribution of a single function of 
r moments could then be easily obtained as a marginal expansion. 

I know of no literature on any of these expansions for general functions. Hsu 
[45] and Chung [17] proved respectively that the sample variance and the one 
sample /-statistic have valid expansions. (There are several errors in Chung’s 
explicit expansion—equation (35).) Hsu proved several results needed for proofs 
for functions of any number of moments. But a very large amount of work was 
involved in completing the proof for each separate function. Hsu stated that 
students were working on other sample function, but I know of no others pub- 
lished except for a statement by Sun [62] that he had proved the result for the 
third moment about the mean and a proof by Hsu [46] for the expansion of the 
distribution of the ratios of two independent means. 

A general result would be highly desirable or else an example of a statistic, 
smooth enough at the population value, but for which the series is not a valid 
asymptotic expansion to show that the construction described is not valid as 
generally as appears plausible. 

The expansions can be obtained formally by a different approach using the 
Charlier differential series identity (3.2) and the classical so-called 6-method for 
calculating moments. Formally compute the moments and from them the cumu- 
lants of the statistie W,, of equation (4.1) by expanding H(X) in a Taylor series 
in XY — yw and integrating term by term. The formal cumulant expansions for 


W,, are of the form: 
; ] 
nu(W,) = 0+ o( =) 
Vn 


(W,) = 1+0 (‘) 
n 


k(Wa) = O(n”?*) r>2 


so that the leading terms behave exactly as for standardized sums of random 
variables. If these formal cumulant expansions are substituted in the symbolic 
identity (3.2), using the unit normal as the developing function, and if the ex- 
ponential operator is expanded formally and terms collected according to powers 
of n’, the same expansion as previously constructed is obtained. 

This latter method is almost always easier to use in practice, especially for 
functions of several moments. Most applications of Edgeworth series use this 
method or some slight variation of it, such as using exact or valid expansions for 
the moments, which are frequently obtainable. 

The 6-method is often used to obtain formal asymptotic expressions for mo- 


ments and cumulants of statisties. A few examples of such use—-for various pur- 
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poses—will be found in references [25], [26], [42], and [76]. The 6-method moment 
expansions are known to be valid in some special cases. If a function of sample 
moments is uniformly bounded by a power of the sample size, is smooth enough 
at the population moments, and if enough population moments (far more than 
apparently needed) exist, then the expansion can be proved valid by extending 
Cramér’s proof [23, p. 354] for the leading terms of the mean and variance. 
Under severe distributional assumptions (for example, for functions of (normal 
theory) mean square variates—-see section seven), the method can be shown valid. 
But there are also examples where the method is not valid, and a wide range of 
applications in between. However, as long as the moments are used only to 
get distribution approximations, it is generally plausible and sometimes known 
to be true that the distribution approximations are valid whether the moment 
expansions are or are not. 

In many statistical applications, quantiles of a distribution are needed. From 
an Edgeworth expansion of a distribution function F,, , as asymptotic expansion 
for a quantile x of F,, in terms of the corresponding normal quantile z can be ob- 
tained by formal substitutions, Taylor expansions, and identification of co- 
efficients of powers of n. The expansion is of the form 


S,(z) S.(z) 
a 


Vv ‘n Vv mn 


in which the {S,{ are polynomials. The reverse expansion 


R,(x) R.(x) 


(4.2) 2=2+ + + 


Vv nN Vv n 
is obtained as an intermediate step and is often useful in itself, giving an asymp- 
totic transformation of a variate z with distribution F,, into a unit normal deviate. 
An expansion of the type (4.2) is often called a normalization formula. Numeri- 
cally it serves the same purpose as the Edgeworth expansion but is often more 
convenient and possibly more accurate. 

Cornish and Fisher [18] carried out these inversions, treating each cumulant 
of F,, according to the order of magnitude of its leading term as determined by 
the 6-method. For the expansion of z in terms of z, they table, for seven common 
probability levels, all the polynomials needed to obtain all terms through order 
1/n’, that is, using up through the sixth cumulants. 

For an absolutely continuous distribution, both of the inverted series, which 
I will call Cornish-Fisher series, can be proved to be valid asymptotic expansions 
for every probability level, whenever the initial Edgeworth series is valid. I 
know of no published proof of this, though Wasow’s [73] proof of the invertability 
of a special class of distribution expansions can be modified and extended to 
work here. 

The Edgeworth and Cornish-Fisher approximations have some faults which 
show up in the tails of the distribution. The distribution function approximations 
are not probability distributions and both monotonicity and the 0-1 range prop- 
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erty are violated in parts of one or both tails. Similarly the quantile approxi- 
mations are not always monotone in the probability levels. These troubles don’t 
contradict the uniform validity of the Edgeworth expansion because it only 
refers to the absolute difference of two functions each approaching zero (or one) 
and not to the relative error. The validity of the Cornish-Fisher series is uniform 
for the probability level in each interior interval but the error increases as the 
level approaches 0 or 1. 

Cramér [22], and others [15], [32], [61] in important work have investigated 
the relative accuracy of the central limit theorem approximation, but for the 
Edgeworth and Cornish-Fisher approximations, the importance of these tail 
difficulties at present must be determined from empirical evidence. Some differ- 
ent expansions constructed to eliminate the tail difficulty will be discussed in 
section six for several specific distributions. 

There have been only a few numerical evaluations of the accuracy of these 
approximations, largely because of the difficulty of obtaining exact values for 
comparison. 

In a major piece of unpublished work, Teichroew [65] has used the terms of 
the Cornish-Fisher series through n™ to evaluate the quantiles of the normal 
theory chi-square distribution for a variety of degrees of freedom and probability 
levels. He has found that the accuracy of this approximation for four degrees of 
freedom, provided that the probability level is not in the extreme half of one 
percent, is at least three decimals with the accuracy improving rapidly as the 
degrees of freedom increase. Even for two degrees of freedom, the series is ac- 
curate to two decimals except in the extreme one percent. The series for the x’ 
is the most accurate application known. 

For the standardized sums of samples of size ten from four symmetrical non- 
normal populations, Chand [9] compared the exact quantiles with the Cornish- 
Fisher approximations through orders 1/n and 1/n’. The latter gave better than 
three decimal accuracy and the former better than two decimal accuracy for 
probability levels ranging from }% to 25%. 

Many more empirical studies of accuracy would be desirable including studies 
of the comparative accuracies of the Edgeworth expansion and the normaliza- 
tion expansion (4.2). 


5. Investigations of robustness. Asymptotic expansions play an important 
part in investigations of the effect of deviations from normality (or other popu- 
lation) on the size and power of various tests. I use the null distribution of the 
one-sample f-statistic as an example. Denote by F, and G, respectively the 
general and normal population distributions of the /-statistic in samples of 
size n. Formal Edgeworth expansions of F’, and G, can be obtained and for the 
t-statistic (but not otherwise) these have been proved valid. Since the difference 
F(x) — G,(x) here is of interest, the difference of the two expansions provides 
a valid asymptotic expansion for the deviation in terms of powers of | n} and 
of normal derivatives. 


Effectively, the original Edgeworth series for Ff, has been replaced by one 
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in which the leading term is G, (assumed known). The approximations to F, 
are then exact for a normal population and greatly improved for “near normal”’ 
populations. Similar modifications of successively higher order terms might be 
expected to give improved accuracy, especially for small n. The possibility 
(quite generally) of using expansions 


F(2) ~ > Bax) H (x, n), 


asymptotically equivalent (at every partial sum) to 


F(z) ~ Do Ad(a)n 
rr 
is a powerful tool to permit improved accuracy of expansions. There is no theory 
on how to choose good functions H; , but useful choices can often be made on 
heuristic grounds or on the basis of a few computations. 

In the ¢/-statistic example, the expansion of F,, in terms of successive deriva- 
tives of the normal theory ¢-distribution might appear natural. Geary [39] ob- 
tained such an expansion by formally applying the Charlier differential series 
(equation 3.2) with G, as the generating distribution, collecting terms according 
to their orders or magnitude. The result can be proved asymptotically equivalent 
to the Edgeworth expansion and hence valid. Geary applies the same formal 
method to an F-statistic (though even the formal derivation of the Charlier 
identity is not valid) and Bartsch [3] applies the method to various 
t-type statistics. 

In the most substantial investigations of this kind, Gayen ([35], [36], [37], 
(38]) has obtained a different asymptotically equivalent expansion for the dis- 
tribution of ¢ (as well as for two-sample ¢, the variance ratio, and the correlation 
coefficient). He has given extensive tables and graphs so his expansions are far 
more easily used than any alternative expansions. The expansions possess also 
a different asymptotic property. 

There seem to be no comparisons of the quality of the several approximations. 
Seemingly, the only feasible method for proving the validity of any of these ex- 
pansions is to show equivalence to the Edgeworth series and to prove it valid 
(if possible). This method would never lead to useful information on accuracy 
since the Edgeworth series is surely much less accurate than these modified 
expansions. 

Although Gayen’s expansions are asymptotically equivalent (in n) to the 
Edgeworth and other series, they have an additional property, not shared by the 
other series mentioned, of being a formal asymptotic expansion for any fixed 
finite n as the population “nonnormality”’ approaches zero. This is made definite 
by assuming that the population distribution itself can be expressed by an Edge- 
worth expansion in some unknown parameter m (i.e., that the population values 
themselves are the means of m independent ‘elementary errors’). The Gayen 
expansion is a formal asymptotic expansion in powers of 1 /m' (m does not need 
to be known to write down the series). This approach seems conceptually more 
relevant to robustness problems than asymptotic expansions in the sample size. 
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Theoretical study of the properties of these series would be desirable, as would 
some comparative computations on various approximations. 


6. Quantile expansions for specific distributions. The expansions that have 
been considered have made use of only the moments or cumulants of a distribu- 
tion. Many useful asymptotic approximations have been developed from analytic 
expressions for the density function of the distribution to be approximated. As 
practically the only distributions known analytically, normal theory distributions 
are the object of most of these expansions. However, the normal distribution 
does not here play the central role that it does in the Edgeworth theory. 

Consider first the expansion of a quantile of one distribution of a convergent 
sequence in terms of the corresponding quantile of the limiting distribution or the 
reverse expansion. When the normal distribution is the limiting distribution, the 
results are necessarily exactly those given by the Cornish-Fisher expansions but 
use of the explicit analytic form greatly simplifies the derivation of higher order 
terms and proofs of validity. 

Let {fn} be a sequence of density functions which converges to a density 
function y. The desired expansions are found as the solutions either for ¢ or for z 
of the equation 


t 


fAx) dx = [ V(x) dx 


or equivalently of the differential equation 


t 


(6.1) f,(t) ew y(z). 


In 1923, Campbell [8] obtained a formal series solution of the differential equation 
for the quantiles of the x° 


distribution in terms of those of the normal distribu- 
tion. He carried the series to ten terms beyond the normal approximation. 
Teichroew [64| has tabled these polynomial terms and used them for the com- 
putation described in section four. 

Hotelling and Frankel [44] followed the same procedure to get four correction 
terms for the transformation of a Student’s ¢ variate into a unit normal deviate 
and also for the transformation of a Hotelling’s 7” variate into a chi-square 
variate. They proved the validity of the expansions. 

Wasow [73] has given conditions on a sequence of distributions with a normal 
limiting distribution such that these expansions can be validly obtained by the 
natural formal methods, and further that each term will be a polynomial in the 
variate. 

The accuracy of these expansions decreases as the probability level becomes 
more extreme. Consider the transformation of Student’s ¢ to a normal deviate. 
It has the form 


P,{t) +4 Pit) 4 


1 n- 
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in which the |P,} are even polynomials of the indicated order. Hotelling and 
Frankel observed empirically that the series is of no value for f greater than n. 
Clearly, the expansion cannot be valid for ¢ of the order of n’ since, from the 
order of the polynomials, no term would approach zero with increasing n. The 
usefulness of the series for small n is severely limited. 

To obtain expansions useful in the tails of the distribution, Teichroew [66] 
has considered a limiting process in which ¢/ and z both approach infinity with n. 
His results are rather spectacular. 

Set ¢ = bn’ + u with ba constant for later choice and the variable u to be kept 
finite. Similarly, set z = en’ + v. The choice c = [log (1 + b*)'] is forced by ex- 
amining leading terms in the differential equation (6.1) relating z and ¢. The 
equation becomes an equation relating uw and v and a formal expansion of v in 
terms of u is easily, though tediously, obtained: 

i pPrlw) 4. Pu) 4 Pau) 

Vn N 

The |p.(w)} are polynomials of the indicated order, respectively odd and even. 
The dependence on b is very complicated. The whole procedure can be reversed, 
treating ¢ as fixed and getting a series for wu in terms of v. In actual use, with a 
given value of ¢ and n, b would be chosen so that ~ is made small or zero thus 
keeping the polynomial terms small. If « is made to be zero, all odd order poly- 
nomials vanish. For 1 degree of freedom and a selection of ¢ values corresponding 
to tail probability levels ranging from } to 10°", choosing b so that u is zero and 
using the first five non-zero terms, the approximation gives the equivalent normal 
deviate to better than two decimal places. The ordinary series is totally worthless. 

The first term is of interest. Taking vu = 0, b t/n’, it is 


2 Vn log (1 + P/n). 


This reduces to the usual normal approximation as n approaches ~ with / fixed. 
By direct analysis, Wallace |72| has shown that for all > O and n 2 1, it satisfies 
the bounds 


> 
oe 


| 
HA 


z—z =z 0. 
Vn 

Knowing that the first term is correct to the indicated order, the entire ex- 
pansion can then be shown to be a valid asymptotic expansion, uniformly for u 
in any finite interval. No bounds are known beyond the first term. 

Teichroew has treated the x° distribution in the same way with the same 
spectacular results. Wallace has obtained a bound as with ¢ for the first term 
approximation in the upper tail. 


The method is applicable to many other distributions but I know of no further 
applications. 
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7. Laplace’s method and studentization. Many calculations in statistics can 
be reduced to the evaluation of the expected value of some function of a mean- 
square variate: 


E{f(v)] = | fv te? dv. 
0 


The integral here is a special case of the integral 


/ giuje* du. 


Its asymptotic evaluation by Laplace’s method is very important in the theory 
of asymptotic expansions. If g and h are well-behaved functions, then for large n, 
the integral except in the neighborhood of the minimum of h is relatively negligi- 
ble to an exponential order in n. Valid asymptotic expansions can be obtained. 

This integral evaluation is an important part of the method of steepest descent 
({28], p. 38) in which the path of integration, considered in the complex plane is 
chosen to pass through a minimum of / and in such a way that the absolute value 
of the exponential e~”"“” falls off most rapidly from its maximum. The integral 
is then expanded by Laplace’s method. 

The method of steepest descent (and not just the Laplace integral evaluation) 
has been used by Daniels [24] to obtain some interesting expansions that gen- 
eralize the Edgeworth expansions for sums. They have some superior properties 
but make use of explicit knowledge of the moment generating function. 

In the expansion of E[f(v)], a simple application of the 6-method is much more 
convenient than a straight application of Laplace’s method (because of the 
constant c, in the expansion for E[f(v)]). Expand f(v) in a Taylor series about the 
population value of v (here equal to one) and integrate term by term. If the ex- 
pectation exists for sufficiently large n and if f has bounded derivatives near one, 
then the expansion obtained is valid. Since the moments of v about its mean 
involve several powers of 1/n, some rearrangement is needed to get an expansion 
of the form 

Elf(r)| ~ >> 2 

n 
But for this last step, the development would have gone as well using a root 
mean square variate s = v’ as argument in the Taylor expansion and integration. 

This expansion method and its natural extension to functions of several inde- 
pendent mean square variates are widely applicable in statistical work. They 
are unusually tractable for obtaining bounds on errors of approximation, but I 
am not aware of any such bounds. 

One important application is to finding the distribution function H of a 
studentized statistic Y/v' in which Y/o has the known distribution function G 
and v is an independent mean square estimate of the squared scale factor o’. 


Then 
H(x) = E | (=) 
o 
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and its expansion is obtained as described. The terms in the expansion are all 
linear functions of the unstudentized distribution function G and its derivatives. 
The expansion was first obtained, in a different way, by Hartley [43]. Moriguti 
[56] developed the result as given here, except that he used the root mean square 
as argument with a consequent unnecessary complication. (His error bound 
(3.2) is incorrect). 

Examples of the use of the expansion to get distributions of various studentized 
statistics are found in references [34], [58], [59], [60], and [69]. Ito [47] develops 
an example of a generalization to multivariate studentization. 


8. The Behrens-Fisher problem. Another application is part of the develop- 
ment of what is to me the most interesting use of asymptotic expansions: the 
Welch solution for the Behrens-Fisher problem and the various extensions and 
analogous treatments of problems like finding confidence limits for variance 
components or for weighted averages when the weights must be estimated. 

There have been a large number of papers attacking these problems, frequently 
repeating the same work ({1], [16], [57], [48], [49], [50], [67], [68], [71], [75], and 
others). Most of the work has consisted of formal expansions with no proofs 
that errors are really of their apparent order of magnitude and there has been 
some confusion as to what the expansions do provide. There have been a very 
few computations, and these very difficult, that indicate the accuracy of the 
approximations. 

I consider in some detail a reduced form of the Behrens-Fisher problem. Let 
Y be normally distributed with mean uw and variance > Ayo, With {A;} known 
positive constants and with the unknown variances o; estimated by independent 
mean square variates s; respectively with n; degrees of freedom. The problem 
is to find a test of the hypothesis 4 = 0, which has significance level a identically 
in the parameters o; . 

The problem is already reduced to the sufficient statistics. Restrict it further 
by considering only one-sided tests of the form: reject if Y > h(si, a, Ax, ns) 
with h chosen so that P(Y > h(s*)) = a. The Welch solution consists in an 
expansion 


(8.1) h(s’) = ho(s’) + hy(s’) + ho(s’) + --- 








in which h;(s’) is of order n~* in the degrees of freedom. 

To my knowledge, it is still not known whether a non-randomized similar 
level a test exists. If there is no function h, the asymptotic expansion (8.1) cannot 
be valid. But the expansion is still of value because it provides tests that are 
asymptotically similar, that is, such that 


r—l 
p(y > his’) =at+ O(n”). 


This interpretation of the asymptotic property was not clear in the original 
papers and was the source of confusion. From the large sample result that 
¥/(>d.s%)' is asymptotically normally distributed the first term must be ho(s”) 
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= z(>-vi8i)' with z the level a normal quantile. Further terms are determined 
successively so that 


r—1 
r (y < >: his) =l—a+O(n'), 


For getting several terms the formal operator formula given by Welch is probably 
the most efficient procedure. The work is straightforward but Aspin [1] reported 
that 100 pages of detailed algebra were required to determine the term hg . 

I take a method that illustrates how a proof of validity could be given, but 
determine only one term. Suppose first that h:(s°) is any function of the variances 
such that it and all its partial derivatives through order two are of order 1/n. 


Let Q(o°) = (> Aio3)! and let 


, 9 ho(s’) + hy(s°) 
uls) = = 
Q(a*) 


P(Y < hols’) + hils’)) = Ef (1y(s°))!. 


To evaluate E {@(u;(s°))} expand &(1(s°)) in a Taylor series in u(s’) — z and 
integrate with respect to the distributions of the sj . 


‘ es ‘ ’ (2) ) 
E(u (s°))! }1—at o(z)Elw(s) — 2] + = Elw(s) — 2} 


” 


QD (z) . 2 3 7 0 4 

th ‘i Elw(s) — 2)’ + cklw ls) — 2’. 
) 

Each of these integrals here is of the form discussed in section seven and is 

validly expanded through formal Taylor expansions and termwise integration. 

Carrying out the process far enough to get all terms of order 1/n, and remember- 

ing that h; and its derivatives are of order 1/n, leads to the expression 


P(Y S hfs) + hi(s’)) =1—aea 


hi(o’) 20! { 26(z) [A°Q(s’) 2°’ (z) Ea 7 
(2) — faa + — : > 
+ le ) Q(o7) + du ni | 2Q(0*) O(s2)? |s2ue2 2Q°(c7) |. O(s%) _|st—o2 


+0 (2) 
we 


Ii h, is chosen to make the 1/n term vanish, it clearly has all the assumed prop- 
erties. 


4 


The determination and proof of validity for each additional term is essentially 
the same. 
The first approximation is 
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It is particularly interesting because it is equivalent, through terms of order 
1/n, to a Student’s ¢ approximation using a degrees of freedom determined by 
the {si} and the {A,} that was proposed much earlier by Welch [74]. 

Welch (appendix to [2]) has computed the true significance levels obtained 
using the expansion through h; for two variances, each with 6 degrees of freedom, 
and using a nominal level of .05. He found that the variation from .05 does not 
exceed .0002. This result seems quite satisfactory, but several more computations 
would be helpful in view of the importance of the procedure. 

The theory of the expansions used by Welch and others was given in a 1949 
paper by Chernoff [16]. None of the papers written on these subjects take any 
notice of the Chernoff work. He gives conditions for validity of expansions in 
an asymptotic studentization procedure due to Wald. Although his detailed 
results are for one nuisance parameter, he illustrates the extension to several 
nuisance parameters by essentially the same construction as here indicated for 
the Welch solution of the Behrens-Fisher problem. 

A straightforward application of the Chernoff results yields an asymptotic 
series solution for a confidence interval for a variance component y, where esti- 
mates v, of o + ¥ and v» of o° are available. The expansion for this problem was 
first developed and proved valid by Moriguti [57]. 

Most notable of the other work along this line is the work of James ([48}, 
[49], [50]), who has extended the Welch formal expansion to univariate and 
multivariate tests of general linear hypotheses with unknown and unequal 
variances and covariances as nuisance parameters. 


















9. Conclusion. I have by no means covered all the interesting and important 
work on asymptotic approximations and have not even considered any non- 
asymptotic approaches to approximations. I have discussed what are to me some 
of the interesting problems, attacks, and results. Much more work is needed, 
particularly theoretical and empirical studies of the qualities of the approxi- 
mations. 
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A COMPARATIVE STUDY OF SEVERAL ONE-SIDED 
GOODNESS-OF-FIT TESTS! 


By DovuGias G. CHAPMAN 
University of Washington 


0. Summary. Criteria for evaluating goodness-of-fit tests are reviewed and 
two additional criteria proposed. The several goodness-of-fit tests which have 
been proposed are studied in the light of these criteria. It is shown that it is rela- 
tively easy to evaluate the maximum and minimum power of those tests which 
are “partially ordered” against alternatives at a fixed “distance” from the hy- 
pothesis. A comparison is made of five tests on the basis of such minimum and 
maximum power functions. 


1. Introduction. Let XY be a real random variable with d.f. F ¢ Q the class of 


© 262 


continuous distribution functions (d.f.) on R. The aim of this paper is a com- 
parative study of some of the distribution-free tests of the hypothesis 


HyikF = F, 
(where Fo is completely specified), against the alternative 
F< Fy. 


The class of distributions belonging to Q. that are less than Fy will be denoted 
by &. (A distribution F is less than Fo if F(x) S Fo(x) everywhere with the strict 
inequality holding on a set of positive /o-measure.) Birnbaum and Scheuer {7} 
have called this problem that of testing goodness-of-fit against stochastically 
comparable alternatives. A list of a number of tests for this situation and for the 
case where the set of alternatives is F ¢Q., F # Fo , as well as some of the con- 
siderations involved in designing such tests, have been given by Birnbaum [4]. 

If the goodness-of-fit test is merely a preliminary test to justify assumptions 
made for the purpose of further tests, its usefulness at the present time is debat- 
able. As yet not. enough is known of the effects of different types of deviations 
from assumptions on the behavior of statistical tests and estimates, nor of the 
effects of preliminary tests. Box and Andersen [9], however, have given examples 
which seem to indicate that the use of a preliminary test may leave the statis- 
tician in a less satisfactory position than if no preliminary test were made. 

On the other hand the goodness-of-fit test is quite reasonable in validating a 
theoretical model. Moreover F, or functions of F, may enter into further develop- 
ments of the whole problem so that it is desirable to have an explicit representa- 
tion for it. 
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In many statistical applications tests are made for changes in a mean; equally 
well changes in the whole distribution may be of interest. In this case one-sided 
as well as two-sided alternatives to Ho could be of interest to the statistician. 

In [4] Birnbaum noted that it is desirable to introduce a metric into the space 
of distributions and he suggested a number of possibilities. The choice of the 
metric is to a large extent a metastatistical consideration. However, the metric 

p(F,G) = sup |F(x) — G(x) | 


—w<r<w 


or in the one-sided case 


p (F,G) = sup (F(x) — G(z)) 
—w<r<w 
has been used extensively in probability and statistics. furthermore these metrics 
seem appropriate in several of the situations discussed above where a test of Ho 
is reasonable. We will consider only these distance functions and more especially 
the second which is appropriate to stochastically comparable alternatives. This 
study will be limited to those tests which have been proposed for this problem 
and for which the distribution theory of the test under the null hypothesis is 
known at least for the asymptotic case. For those tests that satisfy certain weak 
criteria, the maximum and minimum large sample power for alternatives whose 
distance from the hypothesis is equal to A, is determined. This approach of find- 
ing sharp upper and lower bounds for the power of a test for such alternatives 
was introduced by Birnbaum in [5}. 
The almost standard notation 


z 


P(x) = =/ ee? dt 
2m J-~ 


and Z, for the root of the equation 


$(x) = 
will be used. 
Ey {f(X)| and Ee [f(X)| will denote the expectation of the random function 
f(X) when X has the distributions Fy and G respectively. 


2. Criteria for tests of Ho. A test of Hy of size a is a measurable function 
o(X, --- X,) or ¢, for short, from F, to the interval (0, 1) such that 


Ev(¢n) S a. 


Consider the alternative Gea. The power function E¢(¢g,) will be denoted 
By, (G). 

The properties of admissibility, consistency and unbiasedness for a test are 
well known. We refer to Birnbaum and Rubin [6] for the concepts of tests of 
structure (d), distribution-free and strongly distribution-free tests, and recall 
that they showed that for all strictly monotone distributions in Q, tests of struc- 
ture (d) are strongly distribution free and conversely. 
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Since all tests we will consider are of structure (d) we may consider the prob- 
lem in its canonical form, i.e., where Fy is the uniform distribution on the interval 
(0, 1) and all distributions of @ are restricted to the unit interval. 

‘To emphasize this, it will be convenient to let u be a sure number in (0, 1) and 
U an r.v. uniformly distributed in (0, 1). It will also be convenient to denote by 
U,, Us, +--+, U, the ordered sample from this distribution. In some instances it 
will be convenient to introduce U» and U,4;,. These are set equal to 0 and | 
respectively. 

We also introduce two more concepts, monotonicity and partial ordering, as 
applied to tests of the hypothesis Ho . 

DEFINITION 1. ¢ is a monotone test of Ho if 


(1) U, = Vi (@ = 1, 2,---n) = (U,, U2,--- Un) 2 


DEFINITION 2. ¢ is a partially ordered [p.o.] test of Ho if 
(2) Gi(u) Ss G.fu) for all ue(O, 1) => B,(G,;) = B,(G). 


From the continuity theorem for Lebesgue-Stieltjes integrals we have the 
following obvious 

Remark. If ¢ is continuous except for a finite number of jumps and ¢ is p.o. 
then ¢ is unbiased. 

The relationship between monotonicity and partial ordering will be useful 
later. 

THeoreM. Tests of structure (d) that are monotone are p.o. 

Proor. Let G,(u) S G.(u) < u and recall 


1 1 1 n 
6,(G,) = i | tee g(t; , U2, °° y Un) I] dG (u;) 
0 0 0 j=l 


Make the change of variables 


y; = G,(u;) (7 = 1,2, --- n)(a 


=~) 
in the two integrals. The inverse is defined in the usual fashion, i.e., 


u; = Gi'(y,)) = inf [x:G(xz) = y;). 


Oszs1 


The two integrals become 


1 pl 1 - 
(4) | | oa / giGi" (ym), a » Gi" (yn)] II dy; 


j=1 


Since G, S G,,G," = Gy"; this, together with the monotonicity property of 
¢, implies the required inequality for 8,(G,), 8,(G@2). 

It may also be noted that any monotone test is admissible. This follows from a 
result of A. Birnbaum [2] (appendix) who considered this problem where the set 


of alternatives is restricted to d.f. with monotone densities. In this paper we de- 
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termine which of the several tests of Hy that have been suggested satisfy these 
criteria and then determine 
B.(A) = inf 8(G);  B,(A) = sup 8,(G) 
GeG(A) GeQ(A 


where 


G(A) = [G:G ea, p (Fy,G) = 4] 


|, 
for these several test functions ¢. 

To obtain sharp upper and lower bounds of the power of any p.o. test against 
all alternatives G such that 
(5) 0 (F., G) 


we consider the alternatives 


G (ul ) 


These distributions are not members of the family of alternatives @, but it is 
possible to find distributions in & arbitrarily close to G,,,,, or Gy, . Hence it follows 
from the continuity of the power functions, that if the test is p.o 


B.(A) = inf B(Gmu,): B.(A) = B,(Gy) 


U< 1 


= 


Such bounds are given below for several of the tests of H that meet the criteria 
of admissibility, consistency, unbiasedness, monotonicity and partial ordered - 
ness. 


3. Fisher and Pearson tests. The statistics 


(S r= —2 20 In ( 
(9) r -2>. inti — 


were introduced in the problem of combining tests but are also suitable for test- 
ing Hy. If Ho is true rand x’ both are distributed as x° with 2n d.f. Furthermore, 
the u.m.p. test of Mo against the family of alternatives 


(10) G, au! 
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is obviously of the form: Reject Hy if  < c. A similar statement may be made 
about 7’. 

Furthermore, such tests are obviously monotone and hence p.o. 

It will be convenient to refer to the tests, reject H if t < cor x’ > c, simply 
as the tests 7, 7’. These are two of the class of likelihood ratio tests of the form: 
teject H if > iat In g:(U;) > c, where g; is the derivative of a specified ab- 
solutely continuous alternative G; . 

If Ey{ln g,(U)) < ~, this test statistic is asymptotically normal and further- 
more if E¢{In g,(U) < »% and Egiln g,(U)] < Eg{ln g,(U)] the usual argument 
shows that the test based on Diet In g,(U;) is consistent for testing Ho against 
the alternative G. 

In particular for the tests 7, x’, we have 

THEoremM. The tests x, 2’ are consistent for the sel of alternatives a. 

Proor. In view of the remark above it is necessary to show that Eo[ln UY’, 
E,{ln U) are finite and E,{In U] < Eg{ln U]. Now 


1 
11) E,{In UF = (In u)° dG(u). 


Let 1 > « > O; for every « 


lIn wu | 


1 1 
(12) | (In u)° dG(u) = (In u)’G(x) : a 2 | G(u) du. 


u | 


Since G(u) = wu the first term on the right-hand side of (12) can be made 


arbitrarily small by appropriate choice of ¢ while for all « the second integral is 


bounded by 
1 
/ In u| du = 1. 
4 


This shows that both Eo{In U}’, Eg{In US exist and also validates the integra- 
tion by parts in the next step. 
lor 


1 
E,(in U) = / (In 1%) dG(u) 
0 


in 
G(u) 
= In uG(u)\} — / ; du. 


/0 u 


The first term on the right-hand side of (13) is zero. Since G(u) S u with in- 
equality holding on a set of positive measure 


1G 
~f G(u) = - | du = —1 = E,{In U) 
0 


0 u 


as required for the consistency of the z test. 
The proof of the consistency of x’ requires consideration of two cases. 


Case 1. E,iin (Qi — U)| > — @. 
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Since G is continuous and G(u) < u ona set of positive measure, 3 e’ such that 


1—e’ 

ae 

- CO) 5, = 25 > 0. 
=e 


-0 


Now in view of the finiteness of F {In (1 — U)], 3 ¢ which may be chosen less 


than 6/3 and e’ such that 
Ine| [1 — GO - e)| < 6/3 
and also 
l l—e 
| In (1 — w) d@(u) — i In (1 — 1) dG(u) |) < 6/3 
( 0 
Now 
| l—e 
In (1 u) dGQwu) < / In (1 — u) dG(u) + 6/3 


l—e ( 
Gi -)ine+ | , 
70 — a 


l—e l—e ’ 
~ d — G(u) 
= Gil — e) Ine +/ ; du — [ 2 = du + 6/3 
0 | a “0 l— wu 
-1] — 28+ 6/3 < -1 é. 


—1+e+lndG(l — 6) 


Since the critical region of the 7’ test converges to: Reject Ho if 


w | ' La 
2n Vn 
while by ‘hintchine’s theorem —(x’/2n) converges almost surely to 
E,{n (1 — U)| S —1 — 6 under the alternative G, the consistency follows 
=—o, 


CasE 2. E,{ln (1 — U)} 
3y well-known results in this case infinitely many of the sequence of the in- 
dependent r.v. 02. In (1 — Uj) n = 1, 2,3 --- , are with probability 1 less than 
nA for any arbitrary A. Hence from the remark on the critical region the con- 


sistency is immediate. 
As a consequence of this theorem it may be noted that 7 is asymptotically 

normal both under Ho and all alternatives in @; it is trivial to give examples that 

this is not true for x’. This behavior is reversed for alternatives G(u) = wu. 
The asymptotic normality of permits an elementary derivation of 8,(4) 


for large samples. In particular 


ind B,(A 
1 — A(1 — In A), 


Ew In U = 


14) 
InUj=1 2A° In A — (In* A)(A + A’), 


(15) om 
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und 
LG) oo In U | = | —A + uy In ( + *) 
Uo 
(17) with max L,,,,\/In U| = (1 — A)[l — In (1 — A)}. 
This maximum is attained when w= 1 — A. Also 
E uu,(In UY’ 2(1 — A) — wolln® (uw + A) — In’ wl 
(18) 


— 2(uo — A) In (uu + A) — Zuo ln w. 


A numerical study of the variance of In U as a function of uw shows that the 
variance is maximized when uw = 1 — A though the changes with respect to w are 
very slight. 

For uo I-A 

om(in U) = (1 — A)[2 — 2In (1 — A) + In’ (1 — A)] 
(19) 
— (1 — A)’*{l — In(i — A)} 


Hence approximately for large n 


Z, ‘oe ow 8 ss i. 
(20) 8.(A) = @ (+ Vnil A) In l A) 4 A} 
: o,(in u) 
: Z n [AQ — ) 
21 Be(a) = #( Za + Vn {All In 4)]_ _\ 
[1 + 2A? In A — In? A(A + A?)]!? 


The minimum power of the x’ test is attained against the alternative Gino , 
i.e., the jump of height A is located at « = A. Furthermore this minimum power 
is the same as the minimum power of the z test. 

On the other hand 7’ will not be asymptotically normally distributed for 
Gy ; in fact with probability 1 — (1 — A)", 2’ = + in which case rejection is 
immediate. However, under the condition that all the U; are less than 1, 7’ is 
asymptotically normal so that 


(22) B(A) = 1 — (1 — A)"{1 — $(x)], 
where 
, —AlndA 
La 
ae ¥"i- 0 
-? r= 


, Aln’A |" 
(1 — A) 
Tables giving numerical values of these minimum and maximum power func- 


tions are displayed in section 8 below where the several tests are compared. 


3. D,, test. The empirical d.f. F,,(w) is basic in many distribution-free tests of 
Hy. The use of the statistic 


(24) D, = sup [uw — F,(w)] 
<u<il 


( 
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as a large sample test for Ho became possible after Smirnov [20] obtained its 
limiting distribution. Subsequently Birnbaum and Tingey [3] gave a closed ex- 
pression for the distribution of Dj, for finite n. 

These results are 
(25) lim Pr [W/n Dy S Z) = 1 — & 


n~e 


and 


[n(i—e)] \n-j e 
(26) Pr[D; <d=1 —- ( > (“)( ~~ ‘) (. +: 
j=0 J n n 


where as usual [z] is the greatest integer contained in x. 

It is immediate from the definition that the test is monotone and hence p.o. 
and admissible, as well as being consistent. 

Birnbaum in [5] gave upper and lower bounds for the power of the D>, test for 
alternatives of fixed distance A within the class of all continuous distribution 


functions. The upper bound is attained for the alternative labeled here Gy and 
we quote his result 


[n(1—en+A)] . 
(e, — A) ) (“)(: ing he re 


lore, = A, 
Xs 
for é, < A 


(27) Bp, (A) } 


> 


where e, is chosen so that 
Pr [D> en | Ho] = a. 


In view of Smirnov’s result for large n 


Bor — (A) = e 2"? for eg A. 


The lower bound of the power of the D;, test within the class of stochastically 
comparable alternatives was studied by Birnbaum and Scheuer [7]. Their result 
is given as a number of double and triple sums of terms of the same type as those 
in (26), and is not in a form useful for comparison or evaluation purposes. 

The following approach does not yield a simple closed expression for the exact 
power, but an adequate approximation is obtained. We write 


3(Gnu, ) = Pr [uo + A — File + A — 0) = €n| Gnu.) 


(28) + Pr [ sup fu — F,(u)} = en\uo +A — Flu + A —O) < en, Grugl 


uC 


> Pr | sup | u nal F,(u)} = En 


ugtAsuc<l 


sup fu — F,(u)} < €n, Guo: 
Osug<ugts 


It will be convenient to symbolize the three terms on the right hand of (28) by 
P, , Pz, P3 respectively. It is immediate that 


[n(ugt+4—e,n)] 


(29) 7 B(k; n, %), 


k=0 
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where the right-hand summands denote binomial probabilities in the usual 
notation. 
An examination of the integral representation of 


P(e) = Pr[{ sup {u — F,(u)} S ¢ 
O<usl 


given by Birnbaum and Tingey in [3] 


ae lin)+e 2 ni+e 
P,,(e n! | [ ee 
Jo Jz, Ss 


° da K+ » de +1 eee diz dix dx, ° 


where A n(1 — e€)|, shows immediately that P2 and P; are bounded by a 
Hence the dominant term in §(G,,,,) is P; which is minimized when Uy) = 4 
This value has been used in making minimum power calculations for the D, 
test. 

However the actual values of P, can be determined in the large sample case 
Consider 


uf" f 


. . 


1 
| du, eee digs digs eee dus duy , 


Uk—1 
where A’ = [n(uo — e,)]. 

The integral form can be written down in a similar manner to that of P,(¢) and 
the result given is obtained by a trivial change of variable. By a slight extension 
of the arguments used by Birnbaum and Tingey this can be expressed as a closed 
sum, viz 


ne kV BS (k\fnuw ne i ne, , 
, - ea axa oe 
- ( k X= ) 2» (“)( Ik: I k ( gk Ok 


It is convenient to denote the function on the right-hand side 
V(niuo/k, ne,/k, k). 
The power of the D, test against alternatives of the form 


fau, Cos 2 €£ Gc a < ow, 


l, 


G,(u) = ¢ 


can be expressed in terms of the function V. This can be seen by writing down the 
integral using the general power formula given by Birnbaum ({5], p. 486) or by a 
simple direct argument. In fet 


os 
(32 
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While the sums in (31) can be evaluated by a straight-forward process, the 
process is tedious for large /, and we obtain instead an asymptotic result that 
yields a method of approximating V in this situation. 

Let Gra(u) denote a sequence of d.f. of the form 


where 


b = min (1 + = P ), 
Vn 
(34) Bor(Gna) = pr sup ¢ + = yn — F.(u) > «| 
Losu<b Vn 


Now we use Donsker’s theorem [11] justifying Doob’s heuristic approach to the 
Kolmogorov-Smirnoy theorems [12] to validate the following steps: 


lim Pr{ sup {Vn(u — F,(u) + au) > Z| 
(35) nee  Osucd 


= Pr[ sup (X(u) + au) > Z], 
( u<l 


where X(w) is a Gaussian process with the properties noted by Doob ({12], p 
397). Further, the transformation he made and his evaluation of 
Pr {sup [¢(t) — (al + b)| = 0} 


may be used to evaluate this last probability. We have in fact 


| ; ; ((u) 4 ; 
Pr | sup [X (uw) + au|> Z| - pr] sup gu) + au > z | 


- ut 1) 


(56) 0 


In other words, putting e, Z/ Vn 


; ; ~~ Z 
lim J (: + —=,-5 .n) 
now Vn Vn 


Henceifn,k— © with Vk [(n/2k) — 1] - 2h) /2\/k and n/k: remaining 


finite and if uo is set equal to 4 


: - € . , Z i 
| | n , NE, 2) es n ; a 1) 
VW iCHe) 2] 
k Vv k 
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so that an approximate evaluation of P. is given by 


= P én ne’, 
(38) P. = pm Blk; n, 3) | exp { — + 2ne, — : 
k= [n(4+A—e,)] m k k 


This formula was used to evaluate P2 for a number of values of n and A. These 
are shown in Table 1. The striking feature of the table is the negligible size of P» . 


Further, by making the change of variable W = 1 — U and noting that 
(39) Ps; Pr[ sup [u — F,(u)] = €|uo — Frluo) < en, Gmugl 
igtAcucl 
it is obvious that for wo 1 P3 Ss Po. 


Hence min,, Boz(Gmu,) is bounded between P; + P2 and P; + 2P2 for large 
samples 


5. Tests related to ),, test. Anderson and Darling [1] considered a class of 
tests based on the more general distance function 


sup Un\F,(r) — F(x) | ¥[F(2)), 


a ~~ 


where y is a non-negative weight function. The choice of y = 1 yields the Kolmo- 
gorov statistic. Anderson and Darling also studied 
( 4 
y(t) = ‘1 — t)’ 
0, otherwise, 
but the distribution function is not in usable form. The distribution of 
SUP_ccrcaV n{[F(x) — F,(x)\WlF(x)|, when Ho is true, has apparently only 
been obtained for the case y = 1. More recently, Pyke [17] has studied a class of 
tests based on a generalized one-sided distance, but again the distributions have 
not been given. 
TABLE 1 
P, = Pr | sup u — F,(u) = e.|%u +A —F,(uw + A —O) < «,|* 


A ‘a 

SO 100 200 400 
0.05 .0081 .0062 .0035 .0011 
0.10 .0027 .0001 — — 
0.20 .0002 — — 
0.30 - -- 
0.40 ~ — 
0.50 ~ 


* Calculations made using formula (38). Entries marked with — are less than .0001 
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One asymptotic result of this type is known that could form the basis of a large 
sample test of H/>. This is the result due to Renyi [18], viz., if Ho is true 
i 
lim P<4/n sup 
(40) se (a: u 


<Z>=, ed, Z>0 


[F,(u) — ul 


Zs 0, 


for arbitrary a,0 <a < 1. 

The restriction a S wis unpleasant since it imposes an additional decision on 
the statistician, viz., the choice of a. Furthermore, it is apparent that the test 
based on this result cannot be consistent against alternatives which do not differ 
from F(x) for the set Er: Fo(c) < aj. On the other hand the test is consistent 
against all other alternatives in @. 

One feature of this test may be noted. The minimum power of the test may be 
studied in a manner parallel to that used for the D; test. In particular the prob- 
ability of rejection is the probability that the empirical d.f. F,(w) falls at some 
point below the line w(l + ¢,) — e«, where e, is chosen to satisfy the size condi- 
tion; 1.e., approximately for large samples 


— ¢ 


an 


‘ : | 
(41) é, = Za { 

The primary term of the power function 3(G,,,,,) is thus seen to be approx- 
imately 


nh — Ze A) Seok, a ee 
(42) a 


= 
| 
me [uo(l — wo)]! 


which for sufficiently large 1 is minimized when 


Uy l—a-—A. 


Further this minimum power will be an increasing function of a; i.e., increasing @ 
will increase the minimum power of the test within the class of d.f.’s for which the 
test is consistent but at the same time this class will be decreased. 


6. Tests based on the integral criterion. ‘To Cramér and Von Mises is due the 
idea of testing Hy by a statistic based on the integral of the square of the differ- 
ence between hypothetical and empirical distribution functions. Smirnov modi- 
fied this by integrating with respect to the probability measure generated by 
#(u). A more general form was given by Anderson and Darling [1] (this paper 
also gives references to the original authors which have been omitted here). This is 


Wi=n [ (Fe) _ F(x) WF (a)] dF. 


The limiting distribution of this statistic with the weight function Y = 1 was 
given first by Smirnov, then by Von Mises and later by Anderson and Darling. 
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They also gave a tabulation of the limiting distribution (ef. [1], p. 203) 
latter authors also give the d.f. of W%, for the weight function 


v(L) “1 — 2) 


but the function is complex and no tabulation has been given 

Before discussing the classical form of W%, , it is of interest to note that since 
we are here considering one-sided alternatives, it is not unreasonable to introduce 
as a test statistic 


a x »! 


13) e =n [F,.(~) — F(x)| dF (x) = n | [F(u) — ul du. 
A. 0 
It is seen at once that 


(44) rm 2, 


1 


so that the test is equivalent to one based on 
“ - lx 
15) > U;. 
n 1 


Such a test has also been proposed by L. Moses. 

For n large, under Hy, U is N(4, 1/12n), while under th alteernative G(u) <u 
it is normally distributed with mean fo udG(u) > 3. The variance of U under the 
alternative is finite so that the test is consistent for all alternatives in &. The test 


is also obviously monotone and hence p.o. For any alternative the large sample 
power is easily computed. In particular 


. ss La — V3nA(2 — A) ) 
1) clA) = BIG = 1 —_ ® r - . 
Poa) ) (, — B6 — 84 +.3A))"* 


17) 37 (Gnu) =1— ve = 2 ieteiieeniea te, 
(1 — AX(6 — 8A + 3d2) + 12u A)” 


The minimum of (47) is attained when ™ 0. 
Consider now the classical integral criterion, i.e., 


aw 


18) o = | [F,(x) — F(x) dF(ax) = | [F,(u) — ul” du. 


0 


It is well known that 
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and that if Hp is true 


2 l 2; 2 1 f/4n — 3 
50) Ww) = —, ‘(w) = - 
(50) ee wih x ( 180n ) 


It is known that if Hy is true nw has a limiting distribution which is not 
normal. However, if Ho is false, the limiting distribution of w, appropriately 
normalized, is normal. 

For if the U; have a d.f. G(u) 


1 1 
[ lu — G,(w)|> du = [ lu — Glu) + G(u) — G,(u)]? du 
0 0 


1 ol 
[ 8°(u) du + 2 | 5(u)G (a) 
0 


“0 


al al 
—2 6(w)G, (uw) + IG(u) — G,(u))* du, 
Jo Jo 
where G,,(u) has been written to emphasize that the sample has been drawn from 
the population with distribution G and where we have written u — G(u) = 6(w). 
The notation 


| 6(t) dt = Du) 
Jo 


1 r 
[ 5 (u) du + 2 | 6(w)G(u) — 2D) + 2EID(U)] C(G) 
0 


“0 


will also be used. 


From Kolmogorov’s theorem that 


52) lim [Pr lV n sup |G(u) — G,(u) 
nen 0< u<l 
it is easily seen that 
l 


Vn | [G(u) — G,,(u)|° du 
Jo 


tends to zero in probability. Also 


l 1 Uv 
; . i=. ar 
| 6(u)G,(u) du = Z 2 b(u) du 
t 


/0 i=l 


‘ 


« oy ~ * Eo pw. 
NM jx} 


Since D(u) S 4 for0 S u < 1, E,{D(U)/ < « and hence 


Vn {i/n S02, [D(U,) — E[D(U)]} 


is asymptotically normal with mean zero and variance given by the usual formula. 
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Finally then Wn (w — C(G@)) is the sum of an asymptotically normal r.v. and 
one tending in probability to zero. It is therefore itself asymptotically normal 
with expectation zero. 

Define w, by the equation Pr [n(w)* > wa | Hol = a. 

The w test (i.e., reject Ho when nw > w,) is consistent but not monotone. Its 
failure to be monotone arises from the fact that the test is two-sided and we are 
here considering one-sided alternatives. On the other hand, at least for n suf- 
ficiently large that the term 


| IG(u) — G,(u)\P du 


is negligible with respect to the other terms of w, the test is p.o. This follows from 
the decomposition (51), since the other terms in this expression increase as G 
decreases. 

The calculation of E[D(U)}, o[D(U)| is particularly simple for the alternatives 
Gru, and Gy. In fact, it is also possible in these cases to calculate straight- 
forwardly F(a’). 


Thus 
(54) 


while 


i _ 
(55) . a 1— ) tag (4). 
nm” a 


The value of uw which minimizes the function 8(G,,,,) is a rather complicated 


expression involving A, n and w, ; however, it is easily seen that as n — © this 
minimizing value tends to }. For simplicity we have evaluated only the ap- 


proximate large sample power function A(G,,.4): 


9 7 9 : 1 
(56) 8.s(Gay) = 1 — o[ ( “) - (4+ ') Vin = -|. 
A Vn oO n 3A'/ n 


Similarly evaluating E(w) and o°(w) to terms of order 1/n° 
(57) Bw2(A) = Bu2r(G yy) 1 — &(2), 


where 


— (a° — 24°)V/n 


1 A’ . A 1/2 
2a[ 5 - v3" | 


7. Other tests. A procedure that has been suggested for the problem of com- 
bining tests and which consequently could be adapted to the equivalent problem 
of testing Ho , is based on the minimum or maximum of the transformed observa- 
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tions, i.e., in our notation, Ul’, or U,, . Even restricting the problem by choosing a 
simple univariate statistic such as 7, , does not yield a unique u. m. p. test. 
Moreover the “intuitive” test of //) against one-sided alternatives—-i.e., reject 
Hy when U, > ¢ for appropriately chosen c—-is obviously not consistent. In fact, 
it is only consistent for those alternatives G(w) < uw such that lim,.. G(w)/u = 0. 
l’urthermore, the test—-reject HT when U, > e— would be consistent for no 
alternatives of ©. 

Of more interest are a group of tests based on another class of statistics, the 
so-called spacing of the observations. It is convenient to define 


(59) i «0, = O., 


’ u 12,°---n+ ] 


Various tests based on the statistics S; have been proposed by Sherman [19] and 


others. These tests are not p.o. and hence are excluded from the present study 


The proof of this fact as well as some other properties of these tests will be given 
in 2 later paper. 


TABLE 2A 


Min mum powe r of Seve ral tests for alti rnalive 8 whose distance from F is A 


.072 
.102 
131 
648 


100 


O80 
.148 

149 
940 


200 


094 
.232 


803 
999 


400 
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8. Comparison of the minimum and maximum powers of consistent, partially 
ordered tests. In the preceding sections it has been shown that the tests asso- 
ciated with the statistics 2(8), 2/(9), Dz (24), U (45) and w (48) are consistent, 
monotone and p.o. Furthermore, useful large sample approximations were found 
for B(A) and B(A) for each test. In view of the fact that most of these large sample 
power functions are expressed in terms of normal probabilities it would not be 
difficult to obtain inequalities between the power functions for the different tests. 
However, in not all cases does the same relationship between the power functions 
persist for all A or all nm. Furthermore, such inequalities do not indicate the 
magnitude of the power differences. 

As a more informative approach calculations have been made of 6(A) and B(A) 
for each test for a range of values of m and A. These have been calculated for two 
test sizes, viz., a = 0.05 and a = 0.01. The minimum power was calculated for 
A = 0.05, 0.1, 0.2, 0.3, 0.4 and 0.5 and where desirable, some intermediate values 
while the maximum power Was calculated for A = 0.01, (0.01) 0.10, 0.15, 0.20, 
0.30, 0.40, and 0.50. A fixed sequence of sample sizes n was used, viz., 
n 50, 100, 200, 400, 600, 800, 1000, 2000, 4000, 6000, 8000, 10,000... with the 
stopping rule, stop whenever the absolute values of the normal deviate ex- 
ceeded 5. 





TABLE 2b 


Maximum power of several tests for alternatives whose distance from Fo is A 





0.01 0.02 0.03 0.04 0 0.06 0.07 0.08 0.09 0.10 0.125 0.15 ).2 

















824 .916 
096 168 | .267 | .387 | .518 | .646  .759 | .849 | .914 | .955.. .994 |1 

.118 | .277 4123 | .562 .668 .754. .818  .869 — 999 
.459 .578 





n = 200 








0.04 0.05 0.06 0.07 0.08 0.09 1.0 











840 .950 .989  .999 
( 124 .250  .421  .609 | .773 | .889 | .954 | .984 | .996 | .999 
2 -- .093 | .317 | .529 | .675 | .797 | .873 | .925) .955 | .975 


421 








.587 | .755 | .897 | .983 1 
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Some of the powers so calculated for a = 0.05 are exhibited in Table 2A and 
2B. It might be hoped that some of the tests could be eliminated by such a 
comparison—this would be the case if B for one test fell below 8 for some other 
test. However, this is not the case. 

In general the tables indicate that the relationship of the tests is reversed from 
the minimum to the maximum power. Thus we have 


o, - Bx < B(w), Bu < Bp, < Boz < B(w’), Bu < Be < Bb; 


~ 


The relationship between the w and U tests varies with A and n. 

It is evident that the x’ test has the best maximum power of the tests con- 
sidered, but its minimum power (and that of the z test) is extremely low. On the 
other hand the D;, test which has the lowest maximum power (of the tests con- 
sidered) has the greatest minimum power. This raises the question whether there 
exists a non-trivial test which is p.o. and for which 8(A4) = B(A). 

An alternative comparison between the tests is given in Table 3, which shows 
the sample sizes necessary to achieve a pre-assigned power level 8, for given A 
and for a = 0.05. The values corresponding to 8 = 0.95 only are listed though 
corresponding values of n have been calculated for 8 = 0.90 and 8 = 0.99. The 
latter calculation emphasizes the poorness of the 7, 7’ tests against alternatives 
Ginuy-over 2,443,900 observations are required to insure 8(0.05) = 0.99. It 
should be noted that these values of n were calculated from the primary term 


TABLE 3 
Sample sizes necessary that 8(A) and B(A) = 0.95 for several p.o. tests and 
for a = 0.05 


Minimum Alternative 


4 
Test 
0.05 0.1 0.2 0.3 0.4 0.5 
DB, 1675 419 105 47 27 17 
w? 14,038 2290 406 153 78 45 
U 569 ,067 34,233 1867 304 77 25 
r, 7’ 1,677 ,025 102,081 23 ,903 4463 1325 511 
Maximum Alternative 
4 
Test 
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 | 0.10 


Dy, | 29,679 | 7420 3298 6855 1188 825 606 464 367 297 


w? 4761 1057 540 302 204 160 104 80 65 53 
} 9108 | 2296 1027 583 375 261 193 148 117 95 
T 3067 936 471 291 200 148 115 92 77 65 
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P,{cf(29)] and consequently the required sample size with the D; test is slightly 
over-estimated. 


It is also to be noted that the smaller sample sizes indicated in Table 3 must not 


be construed too literally since they have been computed from asymptotic 
formulae. 


Of these tests considered it appears that if no information is available on the 
possible alternatives to Ho then from some minimax point of view, the D;, test is 
the most favorable. 
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ON THE NONRANDOMIZED OPTIMALITY AND RANDOMIZED 
NONOPTIMALITY OF SYMMETRICAL DESIGNS' 


By J. Kierer’ 
Cornell University 


0. Summary. Many commonly employed symmetrical designs such as Bal- 
anced Incomplete Block Designs (BIBD’s), Latin Squares (LS’s), Youden 
Squares (YS’s), ete., are shown to have optimum properties among the class of 
non-randomized' designs (Section 3). This represents an extension of a property 
first proved by Wald for LS’s in [1]; a similar property demonstrated by Ehren- 
feld for LS’s in [2] (as well as a third optimum property considered here) is shown 
to be an immediate consequence of the Wald property, and the Wald property is 
shown to be the more relevant when one considers optimality rigorously (Sec- 
tion 2). Surprisingly, all of these optimum properties fail to hold if randomized' 
designs are considered (Section 4); the results of Sections 2 and 3, as well as those 
appearing previously in the literature (as in [1], [2], [3]) must be interpreted in 
this sense. Generalizations of the BIBD’s and YS’s, for which analogous results 
hold, are introduced. 


1. Introduction. Wald [1] stated an optimality criterion (called E-optimality in 
Section 2) for designs used in testing hypotheses in the setting of two-way soil 
heterogeneity where LS’s are commonly employed, and succeeded in proving 
that a slightly different criterion (called D-optimality in Section 2) is satisfied 
by the LS design. Wald also stated that an analogous result holds for Graeco- 
Latin Squares and higher Latin Squares. This statement gives rise to speculation 
when one considers that, in a 3 & 3 Graeco-Latin Square (or, more generally, in 
ann X n square of order n — 1), there are no degrees of freedom for error: this 
implies that any test (e.g., of the hypothesis Ho that there are no treatment 
effects) whose size (= supremum of the power function under Ho) is a, has a 
power function whose infimum over any of the contours usually considered 
(Y(u),¢ = constant, as discussed in the sequel) is Sa. It is easy to construct a 
better design, i.e., one for which the infimum of the power function of some test 
over such a contour is > the size of the test; for example, for each of the two 


teceived July 8, 1957; revised January 22, 1958. 

' One of the referees of this paper felt that the following remark on nomenclature should 
be included: Throughout this paper, the term randomized design is used in describing a 
statistical procedure which chooses according to a prescribed probability mechanism a 
member of a given class of ordinary designs, the chosen design being the one actually used; 
a precise definition is given in the text. The properties of such a procedure take into account 
the probabilities of the various possible choices. A nonrandomized design chooses one member 
of the given class with probability one. The customary usage of the phrase randomized design 
in the design of experiments can be viewed as a special case of the decision-theoretic usage 
employed here, but the reader is warned not to interpret the phrase in that narrower sense. 

tesearch sponsored by the Office of Naval Research. 
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factors, with probability } use an ordinary LS design on the three levels of that 
factor holding the level of the other factor fixed.° 

The phenomenon just described makes one wonder whether the optimality 
result for ordinary LS’s also fails to hold if one permits comparison with ran- 
domized designs.’ At the same time, the question arises whether an analogue of 
the limited optimality property of the LS (or Graeco-LS) design holds in a wide 
class of design settings for designs with suitable symmetry properties, and 
whether these designs fail to be optimum when compared with randomized 
designs.’ This paper answers these questions affirmatively. 

In Section 2A we define four optimality criteria ( D-, F-, M-, and L-optimality) 
for designs (especially, for the normal case); Wald [1] and Ehrenfeld [2] proved 
D- and E-optimality, respectively, for the LS design. It is indicated why M- 
optimality, the strongest and least artificial of the four, seems very difficult to 
verify in most problems (although L-optimality, which is a local version of M- 
optimality, can sometimes be verified). At the same time, we list briefly for later 
reference the known results on the Analysis of Variance Test which are used in 
optimality considerations, and point out the incorrectness of tacitly assuming 
(as previous work in this area has done) that one should use that test, whatever 
design is chosen. In Section 2B we indicate by example why /-optimality seems, 
at least in the present state of knowledge indicated in 2A, the least satisfactory 
of the criteria considered; the connection of D-optimality with Isaacson’s notion 
of type D tests [11] is examined. In Section 2C it is shown in a general setting 
where there is suitable symmetry that D-optimality implies F-optimality and 
L-optimality. 

In Section 3A it is indicated why the treatment of LS’s is much simpler than 
that of YS’s, BIBD’s, etc., and the general treatment of incomplete block designs 


3 It should be evident that the example of the 3 X 3 Graeco-Latin square, as well as the 
example discussed in the fourth paragraph below wherein two observations are taken, are of 
no practical importance; these simple examples are given to illustrate the general principles 
of Section 4. Those principles show that a precise study of certain optimality criteria for 
designs associated with familiar problems of testing hypotheses, can lead to the unexpected 


conclusion that certain intuitively unappealing randomized designs are superior to certain 
intuitively appealing nonrandomized symmetrical designs. The principles are less trans- 
parent (although applicable) in the context of applicationally meaningful problems such as 
those of Section 4, than in the simple examples; hence, the latter examples are discussed 
first. The present comments are included because two referees apparently read these simple 
examples as practical suggestions. In the same light, it is clear that the design 6 in the 
fourth paragraph below, as well as its analogues in Section 4, is not suggested to the prac 
tical worker who wants estimates of all treatment effects; for these designs illustrate a non- 
optimality property of classical nonrandomized symmetrical designs in hypothesis testing, 
and a local property at that (see Section 5.4). In fact, the results of Section 4 are not even 
relevant for most estimation problems (see Section 5.2). To the practical worker who objects 
(as at least one has) to the conclusions of Section 4 on the grounds that one should not use a 
design which does not estimate all treatment effects, it should be pointed out that (1) the 
classical nonrandomized symmetrical design may still possibly possess certain global 
optimality properties (see Section 5.4), and (2) perhaps his problem is not really one of 
testing hypotheses 
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of Bose [4] is briefly recalled ; this treatment proves more useful in Section 3C than 
the more direct least squares approach used in [1] and [2] would be. In Section 
3B several algebraic propositions (emphasizing the role of symmetry) are 
verified, which can be used to prove D- and E-optimality in important examples. 
Several such examples are considered in Section 3C, including generalizations of 
the BIBD’s and the YS’s. 

Section 4 contains two theorems the consequences of which are that non- 
randomized symmetrical designs are not optimum if randomization is permitted. 
In Section 4B it is shown that, whether or not the variance is known, for a 
sufficiently small there is a randomized design whose power function is uniformly 
larger than that of the symmetrical design in some neighborhood of the hypoth- 
eses Hy that all treatment effects are the same. This is slightly less transparent 
than the result of Section 4A, which gives an analogous result for all a when the 
above Hp is replaced by the hypothesis that all treatment effects are equal to 
some specified value. The latter result can best be understood by considering the 
simplest example’: Suppose X;; normal with unit variance and mean y; and that 
all X,; are independent (7, 7 = 1, 2). Our problem is to select (before observation) 
exactly two of the X;; and use them to test 4: = we = 0 against some class of 
alternatives. The symmetrical design d(say) selects X,, and X and uses the 
usual x’ test, and obviously has constant power > a on the contour ui + uw: = 
c > O, while either of the designs d; (¢ = 1, 2), where d; uses X,, and Xj, has 
a for the infimum of the power function on this contour. Let 6 be the randomized 
design’ obtained by using d, or d with probability } each. It is easily seen that, 
for 4; and ye near 0, the power function of 6 is a + C1(ui + uw) + terms of higher 
order, where c; > 0. Thus, on the contour Mai + us = c > O with c small, the 
power function of 6 is almost constant and hence approximately equal to the 
value at w. = we = (Ce; 2)'. Thus, in comparing d and 6 near Hy , we may toa 
first approximation assume py; = we. But 6 is clearly optimum for testing uy, = 
us = O assuming w: = u2, while d (whose test is based on Xj + X2) is not. 
This explains why, for ¢ small, 6 has a power function greater than that of d. 

Many of the results of this paper have counterparts for problems of point and 
interval estimation, for other distributions, etc. Such extensions and generaliza- 
tions, as well as various other remarks, are stated in Section 5. 

In design settings where no suitably symmetric design exists, it is often tedious 
algebraically to show that a design which is “closest to symmetrical” is optimum 
(if it 7s optimum: see the example of Section 2B), and we omit such consider- 
ations here. On the other hand, the conclusions of Section 4 have little to do with 
whether or not symmetrical designs are being considered. 

Throughout this paper, except where explicitly stated to the contrary, Y will 
denote an N element column vector whose components Y; are independent 
normal random variables with common variance o (it will be explicitly stated 
whenever o° is assumed known; whether or not o° is known has very little effect 
on our results); u is an unknown m-vector, Xq is a known N X& m matrix de- 
pending on an index d (the ‘‘design”’) and which will be described further below, 
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and the expected value of Y when yw and o° are the parameter values and when 
the design d is used is 


(1.1) Eyie:aY = Xap. 


X, is, within limits, subject to choice by the experimenter. (In many applications 
it is a matrix of zeros and ones.) We denote by A the set of choices of the index d 
which are available to the experimenter. A randomized design’ 6 is a probability 
measure on A (the latter will usually be finite in this paper, and measurability 
considerations will be trivial otherwise) which is used by selecting a d from A 
according to this measure and then using the selected d. We denote the class ot 
available 6 by Ag. 

In many problems, one imposes an additional assumption of the form Tu = y¥, 
where [and y are known g X mand g X 1 matrices. Such an assumption can be 
absorbed into (1.1) and we suppose this to have been done, with no loss of 
generality. 

A hypothesis //) will in this paper be of the form Ru = 0, where RP is a specified 
r X m matrix (r S m) which we can take to be of rank r with no loss of gener- 
ality. For simplicity, we can think of the class H, of alternatives as being all u 
for which Ru ~ 0. (For simplicity, we assume that o° is either known exactly or 
else is known only to be positive, under both Hp and H;.) A hypothesis of the 
form Ru = p is easily reduced to the above form by letting p satisfy Rp p 
and replacing ) by Y* = Y — Napand uw by p* = wv — pin (1.1). 

We introduce some notation to be used in Section 2. We denote the k xX / 
identity matrix by /;,. The transpose of a matrix A is written A’. It may or may 
not be that all r elements of Ru are estimable when a given design d is used 
Suppose that there are s, linearly independent linear combinations of the ele- 
ments of Ru which have unbiased estimators when d is used, but not sz + | 
such combinations. Then there is an sz X r matrix Q, such that there exist linear 
unbiased estimators of all components of Qiu when design d is used; let ta be 
the s,-vector of such estimators with minimum variance (“best linear estimators’”’ 
or b.Le.’s), and let o V4 be the convariance matrix of the components of (,. 
When sz = r, we may take Q, to be the identity; for this choice of Qa , we shall 
denote Vz by Vy. Let b, be the rank of X,. Then there are 6, linearly inde- 
pendent combinations of the components of uw which are estimable when d is used. 
Of these, sa of them can be taken to be the elements of QuRyu; thus, there exists a 
(ba — 8a) X m matrix J, of rank bs — sa whose rows are orthogonal to those of 
QaR (i.e., JiQak = (0) and such that all components of J,u have unbiased es- 
timates when d is used. Let La be the ba & m matrix whose first by — sy rows are 
Jaand whose last s, rows are QuR. Let Sa be the usual best unbiased estimator of 
o (if it is unknown), so that (V — by) S,/ o° has the y*-distribution with h, 

N — ba degrees of freedom (it may be that hy = 0 and there is no S,). For any 
test da associated with d, let 8,,(u, ¢) be the power function of 4 (of course, 
Be, actually depends on uw only through Lau). For 0 < a < 1 we denote by 
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Ha(a) the class of all $a of size a, i.e., all ¢¢ for which 


(1.2) Bo4(u, a’) <= a whenever Ru = 0; 


’ 


and by Hi (a), the class of similar tests of size a, i.e., those for which (1.2) holds 
with the inequality sign replaced by equality. Finally, let /4,. denote the usual 
F-test of Ho of size a with s, and hy degrees of freedom, based on t,Va'ta/saSa 
(if o is known, this is replaced by the appropriate x’-test). 

The symbol g,, ;(a) is used to denote the derivative at Ho of the power function 
of the F-test of size a and 7, j7 degrees of freedom, with respect to (a common 
choice of) the parameter on which it depends; specifically, if r = m = i, N — 
r = j, the matrices R, Qa, and Va are the identity, and the true values of « and 
o are such that p’u/o = X, then, as \ — 0, the power function of F'a,¢ is 


(1.3) a+ gi(a)X + O(W’). 


The results of this paper can be stated in a very general setting involving 
invariance of A, of the restriction Ru = 0, and of a generalization of the function 
y considered below, as well as of certain designs, under an appropriate group of 
permutations of the components of u. However, in order to make our proofs 
(and, in particular, the role of symmetry) as transparent as possible, we will 
carry them out in two cases; the reader will not find it difficult to state our results 
more generally by making appropriate linear transformations, etc. The two cases 
(A and X, being further specified in particular examples; the role of the function 
¥ which distinguishes contours on which the power function is examined, will be 
seen in Section 2A) are: 


Cask I: W(p) = Zz wand Rk = R,; 
1 


Case II: ¥(u) = z (ui — @)’ and R = Ry; 
1 


here we have written yw’ = (u1,---, um), and Z = > p:/u, while R, is the 
u X u identity followed by m — u columns of zeros (so Ry = O means wy = 

- = yp, = 0), and Ry, isa (u — 1) X u matrix P followed by m — u columns of 
zeros, Where P consists of the last « — 1 rows of a u X u orthogonal matrix O 
whose first row elements are all 1/+/u (so Ryu = O means wm = --: = w,). 
The optimality results which hold in Case I are usually much more trivial to 
obtain than those of Case IT, and Section 3B will therefore be mainly devoted 
to results applicable to the latter case, it being clear how to obtain the corres- 
ponding results in the former case. 


2. Optimality criteria. 

2A. Preliminaries. For a fixed design d, the test /y.. is known to have several 
optimum properties, which we now list (there are obvious analogues when 
o is known): 
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(a) If sz = 1 (and only then), among tests in Ha(a) which are unbiased (this 
implies that the tests are in Hi(a)), Fa, is uniformly most powerful (UMP). 
See [5] (a trivial completeness argument characterizing similar tests is all that is 
required to allow the Jau which is not present in [5] to be introduced, carrying 
through the argument there for each fixed value of the b.lLe. of Ja). 

(b) Among tests in Ha(a), Fa,4 is UMP invariant (under the usual group of 
transformations when the problem is reduced to canonical form). See [5]. 

(c) (Wald’s theorem) Among tests in H7(a), for each c > 0, o > O, and 
value of Jau, the test Fa,. maximizes the Lebesgue integral of y4,(v, Jau, 0) on 
the sphere v’y = c, where vy = G,QuRy with Gz nonsingular sz X sq is such that 
the b.1.e.’s of the components of v have o” times the identity for their covariance 
matrix (i.e., v is the vector of parameters about which Ho is concerned in the 
canonical form of the problem), and where y4,(@GQuRu, Jay, 0) = By,(u, 0°). 
See [6] or [7] (the parenthetical remark at the end of (a) is relevant to [7] here). 

(d) (Hsu’s theorem, a consequence of (c)) Among tests in H.(a) whose power 
function depends only on Ag = p’R’'QiVa'QuRu/o’ (this implies that the tests are 
in H3(a«)), Fa. is UMP. See {8}. 

(e) Among tests in Ha(a), Fa,a is minimax (over H,) for a variety of weight 
functions, e.g., any nonnegative function of the Aq of (d); in particular, Fa,. 
maximizes the minimum power on the contour \u = ¢ for each c > 0. See [9] or 
[10] (the result follows from (c) if we restrict consideration to H7(a)). 

(f) (A special case of (e)) Fa,4 is most stringent in H4(a). See [9] or [10]. 

(g) (A consequence of (c)) Fa, is of type D in Ha(a). (See [11] or Section 2B 
below for definition of type D, and Section 2B for a proof.) 

It is to be noted that all the above criteria of optimality of the test Fa,4 are 
relative to the design d. Thus, it is an error to assume (as has been done in pre- 
vious papers on optimum designs) in a logical approach to optimum design 
problems that one should automatically use the test /'4,. , whatever the chosen d, 
when a reasonable criterion for optimality of a design, or of a test for a given design, 
may dictate the use of a test other than Fa... In fact, the example of Section 2B 
really illustrates that the use of Fz,. need not lead to an optimum design or test 
for many reasonable definitions of optimality; and the fact that it seems diffi- 
cult (for many reasonable optimality criteria such as M-optimality, and for 
many common design problems) to characterize the appropriate test, is what 
makes it much harder than it has been thought to give a rigorous demonstration 
of the optimality of various common designs. We now list four optimality cri- 
teria for designs (there are many other obvious similar ones); the discussion of 
their meaning immediately follows the fourth definition. 

M-optimality: For c > 0 and0 < a < 1, a design d* is said to be Ma,--optimum 
in A if, for some $4 in Hy (a), 


: 2 . " 2 
(2.1) inf By+,» (u,o) = max sup inf By (uo), 
r, deA ¢eHa(a) TT, 


where I, is the set of all u, o° for which W(u)/o” = c. 
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L-optimality: A design is said to be L..-optimum in A if, for some $4 in Has (a), 


(2.2) lim [ag+,+ (c) — al/[b(c) — a) = 1, 
c+0 


where dg-,+ (c) and b(c) are the expressions on the left and right sides of (2.1), 
respectively. A design is said to be L-optimum in A if it is L.-optimum in A for 


. <a. 8. 
D-optimaltty: A design d* is said to be D-optimum in A if 
(2.3) det Vis = min det V., 






deA! 





where A’ is the set of d in A for which sy = r, and if d* ¢ Q’. 
E-optimality: A design d* is said to be E-optimum in A if 































(2.4) a (Vas) = min a(V.) 


de! 











and if d* is a member of A’, where x(V.) is the maximum eigenvalue of Ve. 

The above definitions will also be used with A replaced by Ag. In that case, 
for any 5, V;" is defined to be the expected value under 6 of Vz’, the latter being 
replaced by the inverse of the covariance matrix of the b.l.e. of the estimable 
components of Ru (with zeros adjoined to this inverse in appropriate places to 
make it r X r) if sa < r; Ag is then the set of 6 for which V;" is nonsingular. 
(This V;' appears in computing certain 8,, near Ho.) 

D-optimality and E-optimality have been discussed in [1] and [2] and will also 
be discussed in Section 2B, where it will be seen that they have to do with local 
properties (near Ho) or optimum properties assuming the use of Fa,4 . Unfortu- 
nately, M,,.-optimality in A (or, better, M,,.-optimality in A simultaneously 
for all c) seems very difficult to verify, even in many simple problems, although 
it does not require much temerity to conjecture that it holds in such cases as 
those discussed in Section 2C. A similar remark applies to L-optimality (see, 
however, Lemma 2.2), a local version (near Ho) of M-optimality. The source of 
this difficulty in verifying M-optimality is illustrated by the example of Section 
2B; it is simply that for fixed d the test which achieves the supremum over ¢ 
on the right side of (2.1) need not be F4,. and is generally hard to compute (as is 
therefore the right side of (2.1)). 

2B. D- and E- optimality. We begin by describing the meaning of E-optimality 
(which criterion is stated in [1] and is verified for the LS design in [2]). Suppose 
for fixed a, that we agreed to restrict ourselves to using Fa,.2, whatever d is 
chosen. The power function of F4,¢ is then a strictly increasing function of da 
(defined in Section 2A(d)). Now, in either Case I or II, for any c > 0, if we 
want a design d for which Fy,. maximizes the minimum power on the contour 
W(u)/o = c (ie., which is M,,.-optimum in A under the additional restriction that 
we use F'a.4), We may restrict our attention to A’ (since, for sg < r, the infimum 
of Br, on the contour ¥(u)/o = c¢ is a; if A’ is empty, there is no problem). 
Fz... has the same number of numerator degrees of freedom for all d in A’; if also 
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ba is the same for each d in A’ (this is often the case in important examples such 
as those of Section 3C) so that the denominator degrees of freedom are the same 
for all Fa, 4, then a design which maximizes the minimum power on (x) oc =c 
simultaneously for all ¢ is precisely one which maximizes the minimum of A, 
subject tow(u)/o° = c. Since ¥(u) = (Ru)'(Ru) in both Cases I and II, this means 
maximizing ming) &’Va't = 1/2(V.). This is precisely the criterion of E-op- 
timality. 

One can cite many practical examples to illustrate that the restriction to using 
Fa.4, Which is imposed in order to make E-optimality meaningful, can have 
serious detrimental consequences. The simplest possible situation will suffice as 
an example: Suppose N > 2,r = m = 2,R = R,, and QA’ to consist of two 


designs with 
> l 0 ra 1 + € 0 
Vay - (5°): Vas -( 0 ), 


where « > 0. Clearly, d; is E-optimum. Moreover, if d; is used, optimum property 
(e) above states that, for every c, Fa,,4 maximizes the minimum power on the 
contour (ui + p>), o=Cc among all tests in Hz,(a@). However, if d: is used, Fa, 4 
does not have this property. For example, if dz is used, let ¢’ be the test which 
with probability (1 + €)/(1 + 2e) uses the F-test (with 1 and N-2 degrees of 
freedom) of size a of the hypothesis uw; = 0, and which with probability «/(1 + 2e) 
uses the F-test of size a of the hypothesis uw. = 0. The power function of ¢’ near 
(ui + bt) o = 0 is then 


a + gi.w-o(a) (ui + u2)/(1 + 260” + o({ui + us] /o°), 
while that of Fa, «is 


a + go,x-2(a) (—*— + 2) /o? + o({ui + uil/o’). 
l+e € 


The infimum of the expression multiplying g2,y-2(@), taken on the contour 
(ui + po)/o° = c, is e/(1 + ©), compared with c/(1 + 2e) for the coefficient of 
dina); since gi.y-o(a)/ge,~-2la) — 2 as a — 0 (see Lemma 4.5 below) the 
assertion three sentences above regarding Fy,.4 is verified. Moreover, since the 
power function of Fa,.« 


a + gon~ola)(ui + w2)/o + o({ur + wsl/o°), 


we see similarly that, at least for a, e, and ¢ sufficiently small, d; is not M,..- 
optimum or L,-optimum, ¢’ being locally uniformly more powerful than Fy, .. ; 
thus, the assertion of the first sentence of this paragraph regarding /-optimality 
is verified. 

Of course, for any fixed a, «, and c we have not asserted that the test ¢’ (con- 
sidered above only for illustrative purposes) is M,..-optimum. If one uses d» , 
the power functions of ¢’, Fu,,., ete., are not constant on (ui + ps)/o = ¢ 
(the same is true of the test which minimizes the integral of the power function 
on that contour), and the computation of the supremum over ¢ on the right side 
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of (2.1) does not seem easy (this will be discussed further in Section 5). Thus, 
the above example also illustrates why M-optimality (or L-optimality) seems 
so difficult to verify in many problems. 

In order to see the meaning of D-optimality, we turn to the notion of a type D 
test as defined in [11] (we discuss the case where o is unknown, the other case 
being similar): For fixed d, let the function B,(n, 7, 0) be defined by 
Bs(Qaktu, Lau, 0) = 85(u, 0) and let 84(r7, 0) (resp., 84) (r, o°)) be the derivative 
of Bs(n, 7, 0) with respect to the ith (resp., 7th and jth) component of 7, evaluated 
at 7 0 (these derivatives always exist). A test @ in H2(a) is said to be locally 
(near Hy) strictly unbiased if 

(a) @QGé Hi (a), 

(b) B4(7, 0) = O for all 7, r, and o, 

(c) the matrix B,(7, 0) = | By (7, 0) | is positive definite for all + and o. 
Clearly, (c) can be satisfied only if de A’. Suppose then that de A’ and that 
QQ, identity (we have mentioned the fact that we can make this choice of Q. 
when d ¢ A’). For any ¢ satisfying (a), (b), (c) just above, det B,(r, 0°) is the 
Gaussian curvature of the surface given by the graph of 8, (n, 7, 0°) as a function 
of » for fixed 7, 0°, at » = 0. A test ¢ is defined in [11] to be of type D if it maxi- 
mizes this curvature for all 7 and o, among all locally strictly unbiased tests. 
This criterion of optimality, although a local one, has certain appealing features; 
for example, it is invariant under all one-to-one transformations of the parameter 
space which leave n = 0 fixed and which at » = 0 are twice differentiable with 
non-vanishing Jacobian [11]. Now, since without loss of generality we are taking 
Q identity, we can compare the behavior of the type D tests for various 
designs in A’, assuming b, to be the same for all d in A’. A design for which the 
Gaussian curvature at 7 0 of the test of maximum Gaussian curvature (for a 
given design) is a maximum (over all designs) is thus, if it exists, that d which 
maximizes max,, det B,, (7, o) simultaneously for all 7, o. That such a design 
is precisely one which is D-optimum follows immediately from the following 
lemma‘ (there is an obvious analogue when o° is known): 

LemMA 2.1. For din A’ and0 < a@ <1, the test Fa,a ts of type D. 

PROOF. Fa,q is Clearly locally strictly unbiased. We again put Qs = identity, 
and a nonsingular linear transformation reduces the proof to the case where 
(i, = identity (see Section 2A(c)), so that » = 7. Wald’s theorem can then be 
stated as 


| [Br, (n, 7,0) — alA(dn) | (B.(n,7,¢) — alA(dn) 


for every ¢ > 0,0 > 0, and ¢@ in H}(a), where A(dn) is Lebesgue measure on the 
sphere n’n = c. Noting that 


‘K(e, r) ifi = j 
(2.6) | A(dn) =< Behan, . - 
4s ” -” 0 a 7, 


where 7, is the ith component of 7 and K(c, r) is positive and depends only on c 


* The author understands that Isaacson gave a longer, unpublished proof, earlier. 
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and r, we obtain from (2.5) by normalizing properly and letting c — 0, for any 
¢ satisfying conditions (a) and (b) above, 


r 


: 
(2.7) + Bry. (r, fie - Bs (r,¢0). 

t=] t=] 
Since Bp,.(7, 0) is a constant times the identity in our reduction, using the 
inequality of the geometric and arithmetic means and the fact that the deter- 
minant of a positive-definite matrix is no greater than the product of its diagonal 
elements, we obtain (omitting some appearances of 1, @), 


det Bz (r, 0°) < I] Bs (r,0°) < | ai | 


i=] i=l 


t=] t=1 


< | Bry. ir| = II Bra. = det B,,. (r,0°), 


which completes the proof. 

To summarize, D-optimality and L-optimality, although local properties, 
seem more reasonable criteria than E-optimality, which is tied to the ad hoc 
assumption that /’,,. should always be used; M-optimality (and to a lesser extent 
L-optimality) seems difficult to verify in many examples. 

2C. Relationship among optimality criteria in symmetric cases. Yor future 
reference we state the following simple result (which was alluded to in Section 0 
in reference to the relation between [1] and [2]): 

Lemma 2.2. Suppose ba is constant for d in A’. If d* is D-optimum and Va 
is a multiple of the identity, then d* is E-optimum and L-optimum. 

Proor. E-optimality is obvious from the nature of Va. If d* were not L-opti- 
mum, since Fa, has property 2A(c), for some other design d’ there would by 
(2.2) be an associated test oy in Hi (a) with 
(2.9) inf det By,, (7, 0°) > det Bry, (1,0) 


r,02 


(the right side of (2.9) is constant); by Lemma 2.1, equation (2.9) is a fortiori 
true if dx is replaced by Fw,. ; this yields the contradiction that det Va < 


det Vase . 

In many examples of Case I where symmetrical designs exist, the condition on 
V+ in the hypothesis of Lemma 2.2 will be obvious. In Case II, as discussed in 
Section 3A, it is often convenient to write the normal equations in the form 
Catz = Za, where Cz isa u X u matrix of rank wu — 1, Za is a u-vector of linear 
forms in Y with covariance matrix C4 , and for any solution ti of these equations 
one obtains the best linear estimator of any contrast Soy ew; with > ec; = 0 by 
forming z citi; Where the #7; are the components of g . Clearly, Pt? is the b.le. 
ta of Ry; uw. Hence, if every diagonal element of Cy has the same value and if all 
off-diagonal elements have the same value, the fact that the first row of the orthog- 
onal matrix O defined in Case II of Section 1 is constant immediately yields 
the fact (see Section 3A) that Va7' = PC, P’ is a multiple of the identity, so that 
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Lemma 2.2 may be applicable in such cases. For future reference, we state this 
simple computation (put a + (u — L)e = 0) in 


Lemma 2.3. Jf U isau X* u matrix with diagonal elements a and off-diagonal 
elements c, then 


ake (u — le 0 
‘“ / , _ (2 + 
2.10) UO ( 0 (a —_ ed , 


We remark that the form of 2, (associated with O) used here makes computa- 
tions and proofs simpler and emphasizes more the role of symmetry (e.g., as it 
appears in the form of Vz" just noted, when Cz has appropriate symmetry), than 
would be the case if R,,; were replaced by a matrix obtained by adjoining a 
column of 1’s and m — u columns of 0’s to 7,1 , as in [1] and [2]. 


3. Optimality of symmetrical designs. 

3A. Preliminaries. The results of this section will be proved for the case where 
o is unknown, the other case being handled similarly. The setting of two-way 
heterogeneity where the LS design is employed is much easier to analyze (and 
thereby obtain an optimality proof) than other settings considered in Section 3B 
such as those where the YS and BIBD are used (and the remarks at the end of 
Section 2 indicate how this analysis can be made even simpler than in [1] and 
(2}). The reason for this is that in this setting where the LS is used, whether u 
is considered to have 3u components (u each for row, column, and treatment 
effects in the u X u case) or 3u — 2 components (to make X4X, nonsingular 
when sy = ba = u — 1), X1X4 becomes particularly simple, having large blocks 
of 1’s (each row and column occur together once, ete.) or multiples of an identity 
(rows by rows, ete.) in the former case, and large blocks of 0’s (especially if O 
is used in reducing Xz) and multiples of an identity, in the latter. Other design 
situations yield more complicated forms of XiX,. Therefore, although the ex- 
amples of Section 3C could be analyzed in a manner analogous to that used for 
the LS in [1] and [2], it appears algebraically simpler to use the incomplete block 
design analysis of Bose [4], to which end we now briefly outline the notation. Of 
course, we are concerned here with the more difficult Case II, which includes 
most of the important examples. 

The form of the Zz; and Cz mentioned in Section 2C depends on the design 
setting and, in particular, in this section, on whether we are in a setting of one- 
way or two-way heterogeneity of (for example) soil (since all block sizes will be 
the same in our example of the former, it could be considered as a special case 
of the latter under further restrictions on uw). We shall first state the pertinent 
results which apply in both of these settings, and then specify the particular 
forms (see [4] for details). The u X u symmetric matrix Cz has row (or column) 
sums equal to zero, and the sum of the components of the u-vector Z, is zero. 
The covariance matrix of Z4 is o° Ca and the expected value of Za is Cau”, where 
u“ is the vector of the first « components of ». We may assume d ¢ A’, which 
means the design d is connected and that Cy has rank u — 1. If {7 satisfies Cali = 
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Zz and P is the (wu — 1) X u matrix defined in Case II in Section 1, then ty 
Ptj is the vector if b.Le.’s of Ryu; the last wu — 1 rows of the equation OC,0' Ot; = 
OZ, are thus PCiP'ta = PZ, (the first row and column of OC,0' are zero), so 
that tg = (PC.P’)'PZ, (the inverse may be taken for d in A’) and thus the com- 
ponents of ¢; have covariance matrix (PC,P’)™. 

In the one-way heterogeneity setting we have u treatments, to be planted in 
b blocks; in our example, each block will contain the same number k of plots, 
one “planting” to be allowed per plot. The component of Y corresponding to an 
appearance of treatment 7 in block 7 has expected value uw, + 5; ; thus, m 
u + b, with wis; = b;. Let nai; be the number of appearances of treatment 7 in 
block 7. We do not restrict nz;; to be 0 or 1, as is often done. Thus, D consists of 
those d for which Xz is any matrix of 0’s and 1’s for which each row contains 
exactly one 1 among the first m elements and one 1 among the last b elements 
and for which the last b columns each contain k one’s; of course, N bk. Let 
fa = i Naij = number of replications of treatment 7, let 7, sum of all 
components of } corresponding to treatment 7, and let B,; = sum of all com- 
ponents of Y arising from block j. The ith component Z,; of Z, (‘adjusted yield 
of treatment 7”) is Z,;; = T; — }5; nj; B,/k, and the (7, j)th component ¢q;; of 
Cais 


(3.1) di ; 1ij/K, 


where 6;; is the Kronecker delta and \q; Dos Maiela, 

In the setting of two-way heterogeneity, we have u treatments and a hy X fy 
array of plots, and the expected value of a component of Y corresponding to 
treatment 7 in row j and column A is pw; + bi! + b,”: thus, m i. b+ J 


a “wT 
° 1 , l ) - 
with bj) = wns; and b, Mm+k,+h. Let nai; (resp., main) be the number of times 


treatment 7 appears in row j (resp., column /), and let 7, be as before and 


1) 2 1 7 
Ba; (resp., Ba, ) be the sum corresponding to the jth row (resp., hth column). 
rai i8 as above, while \,/; . Naas, for g = 1, 2. In this case Z, Ta, - 


Di ais BGS ke — Doangin Bae /ky + ax Qos Tao/hiky and 


Tata 


Ir ke , 


Cd = OT di 


Many other design settings can be treated similarly ; the above two will be used 
in the examples of Section 3C to illustrate our methods of proving optimality. 

3B. Algebraic results. We now demonstrate the algebraic results used in proving 
optimality in the examples of Section 3C and which will be useful in other 
examples of Case II. The results proved here are meant to apply elsewhere than 
in the settings of Section 3A. We suppose in the present Section 3B that we are 
given a class |Ky,deA’} of uw X w symmetric nonnegative definite matrices of 
rank uw — 1 with row and column sums zero and define Wi = PA,P’ (in our 
applications, W, = V7'). The elements of O, Ky, and Wy will be denoted by 6,; , 
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haiy, and wa, , respectively. In Lemma 3.2 we consider an orthogonal matrix 
0 0,;\ , not necessarily O, and a diagonal matrix D = | di; 

Our first lemma merely translates into terms of Ka the obvious fact that, if 
Wa has equal eigenvalues and if the sum of the eigenvalues (= trace) of Wa is a 
maximum for d = d*, then the product of eigenvalues (= determinant) of Wa 
is a maximum for d e". 

Lemma 3.1. /f all diagonal elements of Ka+ are equal and all off-diagonal elements 
of Kas are equal and S°, ka; isa maximum for d = d*, then det Wa is a maximum 
ford o. 

Proor. Since 6,; = 1/+/uand > er kai; = 0, the upper left-hand element of 
OK,O’ is zero. Since the traces of OK,O' and K,4 are equal, we conclude that the 
traces of Kg and W, are equal, so that the trace of W., is a maximum for d = d*. 
The result now follows from Lemma 2.3 (follow the steps of (2.8) with W, for 
B, and Was for Br, ,). 

We shall actually prove in Theorems 3.1 and 3.2 that the trace of the matrix 
PCP’ is @ maximum and that all eigenvalues are equal when d is a BBD or 
GYS, so that Lemma 3.1 is relevant. However, there are settings in which the 
next three lemmas are more useful for proving D- or E-optimality directly when 
the hypothesis of Lemma 3.1 is difficult to verify or is false. 

Lemma 3.2. For u > 1 if O is orthogonal u X u, D is diagonal u X u, K is sym- 
metric nonnegative definile u X u with row and column sums zero, and ODO’ = K, 
then 


(3.3) (" = 'Y (Aa) ” * I kis. 


Proor. We assume d,,,, 0 < dj, for? < u, or the result is trivial. Since, then, 


‘ ‘ ‘ ‘ 1 os | ‘ 2 
(3.4) 0 = LV = LY YT ondude = Ldu(D ow), 
’ 1 == 


tm] jyorl sel s=1 i=] 


we conclude that the first « — 1 columns of O are orthogonal to the vector of 
ones. Hence, 0;,, i/v/u (or its negative, which is treated in the same way). 

Let the coordinates of a point ¢« in u(u — 1)-dimensional Euclidean space be 
denoted by ¢;; (@ = 1,--+,u;j = 1,--+,u— 1), and let B be the set of points 
¢ in this space for which all ¢;; = 0, for which > €,; = (u — 1)/u for all 7, and 
for which > €.) 1 for all }. We shall prove below that ¢ in B implies 


(35) H(Lea)2(“=') Aa)” 


since the left side of (3.5) with ¢,; = 0/; gives the right side of (3.3) and since the 
restrictions on the ¢,; in B must be satisfied by the o; ; (the orthogonality restric- 
tions on the 0,;; are omitted in defining B), (3.5) implies (3.3). 

Call the left side of (3.5) f(e). It is easy to verify that —log f(e) is convex in 
eon u(u — 1)-space, and hence on B. Moreover, B is a convex body in u(u — 1) 
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space, and any extreme point of B is either 
(en eee €1,u—1 


(3.6) 


) 
| 
| 
| 


| . 
(€ul att ianail 

or is obtained by permuting the rows of the matrix on the right side of (3.6). 
Since a convex function on a convex set attains its maximum at an extreme point, 
we conclude that the minimum of f is attained at one of these extreme points. 
But f has the same value at any of these extreme points, namely, 


u—l u—l 
(3.7) min f(e) = (x d/u) II (: ns Mas). 


B t=1 i=l u 
Thus, it remains only to prove that the right side of (3.7) is no less than the right 
side of (3.5), i.e., that 


u—l u—l 

(3.8) I """ « > dii/(u — 1); 

i=l i=] 

but (3.8) is merely the well-known inequality between the geometric and arith- 
metic means. 

The form of Lemma 3.2 which is useful in many applications is the following: 

LEMMA 3.3. Tf [[tkaii is a maximum for d = d* and if Kas has all diagonal ele- 
ments equal and all off-diagonal elements equal, then det Wa is a maximum for 
d= d*. 

Proor: We use Lemma 3.2 with the product on the left side of (3.3) going from 
2 to u, in order to conform to previous notation. In this form, with O = O, it 
follows from Lemma 2.3 that the left and right sides of (3.3) are equal for K = 
Ka». Hence, from Lemma 3.2, II: Wasi 18 &2 Maximum ford = d*. Since II. Wai = 
det W. with equality for the diagonal matrix Wa , the proof is complete. 

The following lemma could be used in the case of the YS, and in more compli- 
cated problems where D-optimality is hard to prove or false, to prove /-opti- 
mality directly (i.e., without the use of Lemma 2.2): 

Lemma 3.4. For u > 1, if m(Wa) is the minimum eigenvalue of Wa , then 


(3.9) m(Wa) Ss wae i min; kai; 
| ie 


if all diagonal elements of Ka are equal and all off-diagonal elements are equal, 
equality holds in (3.9). 


Proor. Let 6; be a w-vector with 7th element one and all other elements zero. 
Let &; = P65; . Clearly, Vu; (u — 1) &; has unit length. Hence, 


kas = 6. Kab; = (08,)'(OKa0’)(08,) 


(3.10) se 
\ : h m(Wa), 


Pee» Wa’ 3 , 
= § Wat: 2 —- min a’W,a = : 
t a’‘a=1 


which proves (3.9); the result. on equality follows from Lemma 2.3 
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The results for Case I analogous to those proved for Case IT in this subsection 
are trivial (since in Case I the analogue of Kg will be nonsingular and Ky will 
be a multiple of the identity), and will be omitted. 

3C. Examples. (1). Optimality of BIBD’s. In the setting of one-way hetero- 
geneity described in Section 3A (with uw > 1), suppose b, u, and k to be such that 
there exists a design d* for which all ng-;; are k/u if k/u is an integer and are 
either of the two integers closest to k/u otherwise, for which all ra-; are equal, 
and for which all Ag;; are equal for i # j. Such a design is called a BIBD if 
k < u, but we do not impose this last restriction here, and therefore call such a 
design a Balanced Block Design (BBD). (For example, if b = 2, u = 2,/ = 3, 
such a d* is that for which ngey) = nae = 1 and nage = Ngo = 2.) Our result is: 

THreoreM 3.1. /f a BBD d* exists, it is D-optimum, E-optimum, and L-optimum 
Proor. From (3.1) we have 


(3.11) z Cai = N - i a Naie/k; 
i=] 

since Dodo. nen = N, it is clear that (3.11) is a maximum for d = d*. The 
result now follows from Lemma 3.1 and Lemma 2.2. 

(2). Optimality of YS’s. In the setting of two-way heterogeneity described in 
Section 3A (with uw > 1), suppose kh; , k,, and u to be such that there exists a 
design d* for which all ra; are equal, for which all {3}; are equal for i ¥ j, for 
which all \js}; are equal for 7 # j, and for which all nj2}; are equal to k,/u if 
k,/u is an integer and are either of the two integers closest to k,/u otherwise 
(q = 1, 2). Thus, d* is a BBD when either the rows or the columns are considered 
to be the blocks. Such a design d* is usually called a YS if ky < u (and k2/u is an 
integer); we do not impose this condition, and shall hence call such a design d* 
a Generalized Youden Square (GYS). (For example, if u = 2, ki = 4, ke = 3, 
such a design d* is easily constructed.) If kj = ke = u, such a d* is of course a LS. 
Our result is: 

THEOREM 3.2. If ki/u or ke/u is an integer and if a GYS d* exists, then d* is 
D-optimum, E-optimum, and L-optimum. 

Proor. We shall show that > Caii iS A Maximum for d = d*; Lemma 3.1 then 
yields the desired result. In this proof only we write [x] = greatest integer S z. 
Let r be an integer. Subject to the restrictions that 1 m; = rand that all m; 
are integers, the expression }°{ mj is minimized by taking k — r + k(r/k] of 
the m; to be [r/k] and r — k[r/k] of them to be [r/k] + 1, the corresponding mini- 
mum of >> mj being r + (2r — k) [r/k] — k{r/k)’ = h(r, k) (say). Wemay assume 


5 The Editor has informed the author that E-optimality of the BIBD’s (as a subclass of 
the BBD’s) has been proved independently by V. L. Mote, and that the minimization of the 
average variance (see numbered paragraph 2 of Section 5) and of the generalized variance 
(i.e., the attainment of D-optimality) achieved by the BIBD’s and YS’s (a subclass of the 
GYS’s) has been proved independently by A. M. Kshirsagar; both of these authors prove 
their results under the restriction and that the ngij and nj{} are all 0 or 1. Under this restric- 
tion, these special cases of the results of this paper are a consequence of the following line of 
argument: the trace of C, is the same for all d, and the results follow at once from the sym 
metry of the BIBD and YS. 
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k,/u is an integer. From (3.2) we have, for any d, 


(3.12) Kyke(kike — D> casi) = Do {koh(rai, ke) — ras} + Do kah(rac, ki), 
: 

with equality in the case of a GYS. The theorem will be proved if we show that 
each of the two sums on the right side of (3.12) attains its minimum ford = d*. 
Now, h(r, k) = r°/k, since the latter is the minimum of >> mj subject to >> m 
r without the restriction that the m; be integers. Hence, the first sum on the 
right side of (3.12) is at least zero. Moreover, this lower bound is achieved by the 


first sum on the right in (3.12) when d = d*, since ry+;/k. = hy/w is an integer. 
It remains to consider the last sum of (3.12). We shall show that, subject to 
12; = ¢c, the expression 
(3.13) G\ei, °° 52x) = >» f(2z, — 1)[z,) — [z,}°} 
i=l 

is a minimum when all z; are equal; putting 2; = [ry://i|, we see that this will 
vield the desired conclusion regarding the last sum of (3.12). The proof regarding 
(3.13) is by induction: assuming the conclusion to be true of m = M, in proving 
the case m = M + 1 we may put 2 = --- = zy = sand zy41 = c — Ms in 
(3.13). The resulting expression is continuous in s and, except on a discrete set, 
has a derivative with respect to s which is equal to 2M({s] — [e — Ms]). The 


latter is SO if s < c/(M + 1) and is =0 if s > c/(M + 1), so that s 
c/(M + 1) yields a minimum. This completes the proof of Theorem 3.2. 

We remark that, without the assumption that /,/u or /./u is an integer, the 
above proof fails and Lemma 3.3 also fails to be applicable generally. To see this, 


consider the case ky = ky = 6,u = 4. A GYS d* exists here, e.g., that one whose 
successive rows are (134324), (412233), (241342), (124123), (313412), (321441) 
We obtain ca+;; = 25/4 for all 7. Let d’ be the design whose rows are (135442), 


(213344), (421334), (442133), (344213), and (334421). Then can = cae = 5, 
Caza = Cane = 8, Cang = —1, Cazes = —4, and all other cv;; = —2. Thus, we 
obtain 3; eavis = 26 > 25 = DO; eaeii and even []; evs, 1600 > (25/4)* 
II. cari; . However, det V7' = 576 < (25/3)° = det Vz. Thus, between the 
designs d* and d’, the former is D-optimum, although Lemmas 3.1 and 3.3 
cannot be used to prove it. Lemma 3.4 could still have been used to prove the 
E-optimality of d* directly. 

(3) Other examples. Many other design settings can be analyzed in a manner 
differing only slightly from the above examples and we mention but a few. One 
can treat similarly problems where the test concerns the b; and 5;”’ of Section 3A 


Problems involving Graeco-Latin Squares or higher Latin Squares, with or 
without replications, admit similar treatments. Higher-dimensional analogues 
(more than two directions of heterogeneity) can also be considered in a like 
fashion, as can complete or partial factorial arrangements. Many of the Case I 
analogues, such as the analogue of the BIBD treatment which assumes the } 
to be known, are trivial. 
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Other problems such as those for which /-optimality is considered in {2} 
(e.g., Hotelling’s weighing problem and certain problems in the analysis of co- 
variance) could be considered regarding D- and L-optimality by similar methods. 

The treatment of some problems is in part parallel, but entails other considera- 
tions in addition to symmetry; such a problem is to test whether a regression 
function 507.) w,f,(x) is actually such that uw. = --- = uw, = 0, where the f; are 
given and N x’s must be chosen from a given region of some space. (Many prob- 
lems in the analysis of covariance involve similar considerations.) D- and E-opti- 
mality are also relevant in estimation problems (see Section 5.2). 

The consideration of some of these other examples will appear elsewhere, in a 
paper by J. Wolfowitz and the author. 


4. Nonoptimality of symmetrical nonrandomized designs among randomized 
designs. ' 

1A. Case 1. We consider here the simplest general setting of Case I, namely, 
the extension of the example of Section 1 to more observations N and more 
treatments uv. Other examples, such as the Case I analogues of the examples of 
Section 4B, have parallel analyses, and we omit them. We shall carry out the 
treatment when o is unknown, the treatment when o is known being similar. 
The underlying probabilistic property (of the normal distribution) which is 
relevant here will now be stated ina lemma. Let U/o* have a non-central x’ dis- 
tribution with .V; degrees of freedom (d.f) and non-central parameter \ = 
EU/o — N,, and let V/o° have the central x’ distribution with N» d.f., with 
U’ and V independent. Let Py,.., (A; @) denote the power function of the F-test 
of size @ for testing \ = 0 based on Nel’ /N,V, and, as in (1.3), let gy 
denote the derivative of this power function with respect to \ at \ = 0. 

LemMA 4.1. /f Ni S Ni and N; + N2 = Ni + N: with at least one of these a 
strict inequality, then Py,.w, (4; @) > Pyiiws (A; a) fordX > 0 and 0 < a <1, 
and gy,.x, (a) > gvi,ni (a) forO ca <1. 

Proor. Let U/o° have a x’ distribution with parameter \ and N, d.f., and let 
V,/o°, V2/o°, and V;/o° have central x’ distributions with N; : Ni — N,, and 
N,i+ No - Ni — N: d.f., respectively (if any of the d.f.’s is 0, so is the corre- 
sponding V,). U, Vi, Ve, V3 are independent. For testing the hypothesis \ = 0 
against alternatives \ > 0 based on U, V;, V2, V3, it is easy to prove that the 
F-test based on NoU/N,(V, + Vo + V3) is UMP unbiased of size @ and is of 
type A, and is the unique (up to sets of measure zero) test with each of these 
properties; in particular, this is true in comparison with the F-test based on 
NxAU + V,.) INvV; , Which proves the lemma. 

The above lemma indicates both that the numerator d.f. should be as small 
as possible without affecting A, which is also true when o is known, and also that 
for fixed N,; + N»2, decreasing VN; helps even more if o is unknown, since Ns» is 
increased (compare (4.5) and (4.7) below). 

We now consider the following problem: Y;; are independent and normally 
distributed random variables with unknown mean yp; (j = 1, --- 


\ (a) 


,m3% = 1, 
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-, u) and variance o (we use a convenient notation for the example, rather 
than that introduced in Section 1). The problem is to test Hou: = we = --: = 
uu = O, and a design d in A is a specification of nonnegative integers n; whose 
sum is N. For any such d, we denote by M(d) the set of 7 for which n; > 0; by 
k(d), the number of integers in M(d); by 7rd, the design associated with the 
values n; = n*,, when d is associated with the values n; = nz , where 7 is any 
element of the symmetric group S, on u symbols; by 6, , the design in Ag which 
assigns probability 1/u! to each rd for 7 in S, ; by faa the test associated with 
64 which is obtained by using the appropriate F-test of size a with whatever 
7d is chosen by 64. We shall also use the symbol a,(c) of (2.2), with Y(u) 
Dru: , and shall denote by as its derivative with respect to c at c = 0. We shall 
also use the symbols g;;(a) introduced in (1.3). Our result, which implies that the 
“symmetrical” design associated with k(d) = u and all n; equal (or as nearly so 
as possible) is not L.-optimum in Ag, and that the 6, associated with the d for 
which n, = N (this 62 chooses each 7 with probability 1/u and takes all Y,; with 
the chosen 7) is locally best among the 6, , is the following: 

TuHroreM 4.1. For every d, a, and c, 


(4.1) ar, ,(€) S ay, ,(c); 


ays, is strictly decreasing in k(d), and the same is true of ay, ,(c) for all ¢ in some 
neighborhood of c = 0. 

Proor. (4.1) is trivial, and we proceed to the rest of the proof. The numerator 
t3Va'ta of Faq is of course 


te M (d) 


Ua= ; Ny (S Yu/ns) 
j=l 


and U,/o has a x’ distribution with k(d) d.f. and non-central parameter 


2 nipi/o. 


ieM (da) 
The denominator of Fa, has N — k(d) df. Write \ = Dim o. From (1.3) we 
have, as \ > 0, 


Bra a( My o) = Z Brig «(bs o)/u! 


reS, 


Dd [a + geca).w—eca(e) Do neui/o’ + O(X)]/u! 


reSy res, 


a + Guia) .w—n (ay (a) 7 > Nei) /Uu!)pi/o + O(d’) 


N : 
a + —_ Qk(d),N nay (a)r + OV’). 
u 


The desired conclusion now follows from Lemma 4.1. 

Existing tables and charts of the power functions of the F-test and x*-test are 
presented in such forms (in terms of / \/(k(d) + 1), usually in inverted form 
and with wide spacing of arguments) as to make accurate comparisons of the 
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3y,,, difficult. This difficulty is made the worse by the fact that 8,,, is not (with 
an obvious exception) constant on the contour \ = constant, making it somewhat 
of a task to obtain ay, ,(c). It is not true, as might be supposed, that this mini- 
mum power on the contour \ = constant is always attained for a » with all com- 
ponents equal, or else is always attained for a uw with all components except one 
equal to zero. To see this, consider the problem of Section 1 (V = u = 2,0 
known). Let C, be the value such that, if Y is a normal random variable with 
0 mean and unit variance, then P{ | Y | > C.} = a. A direct computation of the 
power function of 6 near wi t wz = 0 yields 
B3(u) = at a (—C./2) 

(4.3) 2V 2r 

{2(ur + we) + (Ca — 3)(ui + w2)/3 + O(A')}. 
Hence, when c is sufficiently small, the minimum of 8;(4) on the contour A = c¢, 
neglecting the term O(a’), is located at yu. = Wc, we = 0 (or we = Vc, ms = 0) 
fC. S V3and at m = w = Ve/2if C. = V3. When we include terms of 
higher order in y, it is no longer even evident that the minimum must be attained 
at one of these two values of xy. 

We see from (4.3) that g),<(a) = (27) ‘C. exp (—C%/2) and it is not hard to 
show that go,.(a) = —a (log a)/2 (see [12], equation (6.27), where d is our \/2). 
Thus, a comparison of a;, _for k(d) = 1 and 2 is given in this example by the 
following table: 


ax Gi.) 92.00) 
01 .037 02% 
05 .114 .075 
10 175 115 
20 2 161 
30 24 181 
50 214 .173 

90 .050 047 
The following lemma shows that, as a— 0, the ratio of the second to third column 
above goes to 2 and, more generally, that gi,2.(a)/g;,.(a) > j/t (this gives a com- 
parison of the various 6, for general N and u and for various k(d) when o° is 
known. as a — 0; see Lemma 4.3 for the case when o° is unknown): 

LemMMA 4.2. As a — 0, 


(4.5) Jj,0(a) = —[1 + o(1)la(log a)/j. 


Proor. Vix j. Let k, be such that if Y is a random variable with central x° 
distribution with 7 d.f., then P| VY > ka} = a. Let fy be the x° density function 
with j d.f. and non-central parameter \. A simple calculation shows that df,(w) 
dX at X 0 is fo(u)[(u/27) — 1/2}. Hence, as ka, 


(4.6) ( = [ fo(u)|(u/2j) — 1/2) du = 1+ 0(1))folkadka/), 


“ky 


by partial integration. On the other hand, an integration by parts shows that 
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- , u) and variance o (we use a convenient notation for the example, rather 
than that introduced in Section 1). The problem is to test Hoty: = we = --+ = 
uu = O, and a design d in A is a specification of nonnegative integers n; whose 
sum is N. For any such d, we denote by M(d) the set of 7 for which n; > 0; by 
k(d), the number of integers in M(d); by rd, the design associated with the 
values nj; = n*,;, when d is associated with the values n; = nz , where 7 is any 
element of the symmetric group S, on u symbols; by 6, , the design in Ag which 
assigns probability 1/u! to each rd for 7 in S, ; by fa,a the test associated with 
64 which is obtained by using the appropriate /'-test of size a with whatever 
7d is chosen by 62. We shall also use the symbol a,(c) of (2.2), with Y(u) 
Dorn: , and shall denote by as its derivative with respect to c at c = 0. We shall 
also use the symbols g;;(a@) introduced in (1.3). Our result, which implies that the 
“symmetrical” design associated with k(d) = u and all n; equal (or as nearly so 
as possible) is not La-optimum in Ag, and that the 6, associated with the d for 
which n; = N (this 64 chooses each 7 with probability 1/u and takes all ¥;; with 
the chosen 7) is locally best among the 6, , is the following: 

TuHEoreM 4.1. For every d, a, and c, 


(4.1) dp 


‘ ° ° ° ° ° ° ° 
as,,, 08 strictly decreasing in k(d), and the same is true of ay, ,(¢) for all ¢ in some 
neighborhood of c = 0. 

Proor. (4.1) is trivial, and we proceed to the rest of the proof. The numerator 
ome ae ‘ . 
taVa ta of Fa.a is Of course 


(c) S ay, ,(c); 


d,a 


teM (d) 


, Ny (S Y 5; ns), 
j=l 


and U,/o’ has a x’ distribution with k(d) d.f. and non-central parameter 


= Ni bi ie . 


ieM (da) 
The denominator of Fa, has N — k(d) d.f. Write \ = doi wi o. From (1.3) we 
have, as \ > 0, 


Byaa (my a) >» Briggs a )/u! 


res, 


7 la + Je(ay.n—K(ay (a) 7 Nei ui/o oh O(X)|/u! 


reSy res, 


a + gicay.w—a(ay (@) Zz. . ne i)/Uui/o + O(r’) 


N ; : 
= a + — Gecay,w—ecay(@JA + O(V). 
u 


The desired conclusion now follows from Lemma 4.1. 

Existing tables and charts of the power functions of the F-test and x’-test are 
presented in such forms (in terms of ~/\/(k(d) + 1), usually in inverted form 
and with wide spacing of arguments) as to make accurate comparisons of the 
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3y,,, difficult. This difficulty is made the worse by the fact that 8;,, is not (with 


an obvious exception) constant on the contour \ = constant, making it somewhat 
of a task to obtain a, 


fa.(€). It is not true, as might be supposed, that this mini- 
mum power on the contour A = constant is always attained for a uw with all com- 
ponents equal, or else is always attained for a yw with all components except one 
equal to zero. To see this, consider the problem of Section 1 (NV = u = 2,0 
known). Let C, be the value such that, if Y is a normal random variable with 
0 mean and unit variance, then P{ | Y | > C.} = a. A direct computation of the 
power function of 6 near wi + pe 0 yields 


C, exp (—C%/2) 
Bu) =a ; 
(4.3) 2V 2r 


-{2(ui + ws) + (Ca — 3)(ui + w2)/3 + O(A’)}. 


Hence, when c is sufficiently small, the minimum of 8;(u) on the contour A = c, 


neglecting the term O(A*), is located at uw, = Wc, uw = 0 (or ws = Wc, m = 0) 


fC. S Viand at wm = wm = Ve/2if C. = V3. When we include terms of 
higher order in yu, it is no longer even evident that the minimum must be attained 
at one of these two values of uz. 


, 2 iu by ; ee a ce 

We see from (4.3) that g:.«(a) = (217) °C. exp (—C./2) and it is not hard to 
show that go,..(a) = —a (log a)/2 (see [12], equation (6.27), where \ is our \/2). 
‘Thus, a comparison of a;,, for k(d) = 1 and 2 is given in this example by the 


following table: 


a Yi. \a) g2 <a) 
01 .037 .023 
05 .114 .075 
10 .175 115 
20 .220 .161 
30 242 .181 
50 .214 .173 
90 .050 047 


The following lemma shows that, as a — 0, the ratio of the second to third column 


above goes to 2 and, more generally, that g;,.(a)/g;,.(a) — j/t (this gives a com- 
parison of the various 6, for general N and u and for various k(d) when o° is 
known, as a — 0; see Lemma 4.3 for the case when o° is unknown): 

LemMMA 4.2. Asa — 0, 


(4.5)  j,0(a@) —{1 + o(1)Ja(log a)/). 

Proor. Fix j. Let ka be such that if Y is a random variable with central x° 
distribution with 7 d.f., then P{Y > ka} = a. Let fy be the x° density function 
with 7 d.f. and non-central parameter \. A simple calculation shows that df,(1) 
dd at 0 is fo(u)[(u/27) — 1/2]. Hence, as ka, 


x 
(4.6) dj.cl(a) = | fo(u)|(u/2j) — 1/2) du = 1+ 0(1))folkadka/j, 


' 


by partial integration. On the other hand, an integration by parts shows that 
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a = 2fo(ka)[1 + o(1)] askag — ©, and hence that kg = — 2 [1 + o(1)] log a. 
This completes the proof. 

4B. Case II. We again treat the case where o is unknown, the other case 
being handled similarly (mainly, use Lemma 4.2 for Lemma 4.3). We first prove 
two simple lemmas. 

LEMMA 4.3. As a — 0, 


(4.7) 9 ji(a) = ta/2) + o(a). 


(This does not contradict (4.5), since ) is fixed in (4.7).) 

Proor: Fix j and 7. Let ha be such that if Y has a central /’-distribution with 
j and idf., then P{Y > ha} = a. Let Gy be the F density function with 7 and 
2 d.f. and non-central parameter \. From [12], equation (6.29) (with A there 
replaced by our \/2), it is easy to compute that dG\(u)/dd at d» 0 is Go(u) 
(7 + 2) u/j + uw) — 1)/2. Hence, ask, > = 


’ 


(4.8) gia) = . [ew ae x os . Van 
24k, J J 1 + uJ 


In the next lemma, we use the following notation: n; (¢ = 1, +--+ , uv) are again 


nonnegative integers with sum N. S, is the symmetric group on uv symbols and, 


for rin S,, a(r) = N™ +> N+(sMi 3 finally, @ = u ’ Dimi. 
Lemma 4.4. For allu > 1, wu, and N, 


4.9) 7 > n, j(ui — (r))° = ulu — 2)![N — N 1S ni] > (u; —@ 


PROOF. Since 


(4.10) 


and, for i ¥ J, 


(4.11) D> Nevins 
we have 


N° de pr)” 


De De Meni = Do wi a Mwy = (w — YIN Do i. 
Equations (4.12) and (4.13), together with 
(4.14) = > Nei (mi — a(r))° = 2. > Nenwi — N z g(r), 


give (4.9). 
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The maximum for fixed k(d) of the factor in square brackets on the right side 
of (4.9) will of course be nondecreasing in k(d). It is the factor gxa)—1,n4ca) Which 
will increase rapidly enough as /(d) is decreased to more than make up for the 
decrease in this term in brackets. 

We are now ready to give our nonoptimality result in several illustrative 
examples of Case II, including those of Section 3C(1) and 3C(2). In all of these 
examples we ignore the divisibility properties; considerations when the design 
does not “divide up” properly (e.g., when k(d) does not divide N in Example 
(1) below) are messier and their consideration does not help in the understanding 
of the phenomenon we are illustrating; thus, we shall assume whatever divisi- 
bility properties of N are needed to make our examples simple. 

(1). One-way analysis of variance. In our first and simplest example, the setup 
is that of Section 4A, except that we now are testing uw, = --- = yw, , and the 
appropriate F’-tests are changed accordingly. Our result has the same implica- 
tion as that stated just above Theorem 4.1, except that it now holds only when 
a is sufficiently small, and the optimum 6 chooses each pair (7, 7) (¢ # j) with 


equal probability and sets n; = nj = N/2. 
rr , . , . . 
PHEOREM 4.21. For every d, a, and c, (4.1) holds; for fixed k(d), ay, , is strictly 
decreasing in > nj , atlaining its maximum for nm = +--+ = nwa = N/k(d); 


for this choice of the n; and for all a in some neighborhood of 0, ay,., is strictly de- 
creasing in k(d) for k(d) > 1; the results just stated for a;,, hold also for ay, , 
(c) for all c in some neighborhood of 0. 


Proor. From Lemma 4.4 and an argument like that of (4.2), we have, setting 


r : ilu - fi) Co. 

(4.15) 3;,.(u, 0) = a@ + guay—arw—na(a)(u — 1)7'(N —N"* > ni)va + O(n). 
When n, . nica) = N/k(d), the ratio of values of ay, _ corresponding to 
two values k and k’ of k(d) with 1 < k < k’ is thus 


Gk-1.N (a) (1 — I/k) 


(4.16) = 5 
ir—1.w—e (a)(1 — 1/k’) 


as a — 0, by Lemma 4.3, this ratio approaches 


(N — k)/k 
(4.17) . —. > l, 
(N — k’)/k 
completing the proof. 
For a numerical example, suppose N = 6, u = 3, with o known. Comparing 
the 6,’s for which k = 2 and k’ = 3, we see that (1 — 1/k)/(1 — 1/k’) = 2; 


thus, the ratio of the two a;,, in this example is 3 times the ratio of second to 
third column in the table above Lemma 4.2. For a < .3, then, the design with 
k(d) = 2 is locally better than that with k(d) = 3, in this example. 

(2). Several-way analysis of variance. With or without interactions, the con- 
siderations are very similar to those of Example (1), and we omit them. 

(3). One-way heterogeneity. In the setting described in Section 3A, suppose for 
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fixed b, k, and u that BBD’s exist for two possible choices u; and ue of the “‘num- 
ber of treatments” to be tested, say for wm and wz with 1 < wm < w S u. Let 


~ 


di = 1, 2) be the design which uses the BBD with parameters b, k, and u; to 
test the hypothesis uw. = --- = wu, , and let 6,, be the corresponding randomized 
design which replaces the subscripts 1, --- , u; here by 7(1),--- , r(us) with 
probability 1/u! for each 7 (or, which is the same thing, which chooses each of 
the possible subsets of u; treatments with equal probability). Otherwise, we use 
the same notation as in Example (1) of this section. 

For any design setting, the parameter of the non-central x” variable tiVa'ta/o 
is (Qaku)’Va'(QaRu), and by Lemma 2.3 and equation (3.1) this reduces in the 
case of a BBD d* with parameters b, k, and u to 


(4.18) [ras = (Agen — Nae ) Ir] Zz (ue, si a) 'e . 


For the sake of arithmetical simplicity only, suppose that k/u; is either an integer 
or is < 1 (the phenomenon to be studied persists without this assumption). 
Then, for d* = d; , the term in square brackets in (4.18) is easily computed to be 
b(k — 1)/(u; — 1) if k/u; S 1, 
(4.19) flu) = 
bh /u; if k/u,; => 1. 


Using now the counterpart of (4.18) for the designs d; and the fact that, for 
my = +++ = mu, = 1 and all other n; = 0, (4.9) becomes 


(4.20) >> dO nu (ui — alr))?/u!t = (ue — 1) Cu, — 12) DS (a 
i t==l 


Tes, 


we obtain, corresponding to (4.16), 
: ( _ r/ 
(4.21) ars... Ju bk—uy bila) (iy Lf iy) 


/ 


ay, Juy—1,bk—un—b¢1(@) (U2 — 1)f (ree) 
By Lemma 4.3, as a > 0 this ratio approaches 


(bk — uy — b+ 1) f(r) 
(bk — uw — b + I)f(uy) ” 


(4.22) 


It is trivial to verify that (bk — w— b + 1)f(uw) is strictly decreasing in « for 
u > 1, so that the expression of (4.22) is >1. Thus, we have proved 

THEOREM 4.22. For fixed b, k, u and all a in some neighborhood of 0, ayy, _ is 
strictly decreasing in u; for i > 1; the same is true for a;,.,(c) for all c¢ in some 
neighborhood of 0. 

This result implies that, if k is even, the locally best 6,, is that which chooses 
each pair of treatments with equal probability and assigns each of the two chosen 
treatments to k/2 of the plots in every block. 

(4). Two-way heterogeneity. Using (3.2) in place of (3.1), the analogue of 
Theorem 4.22 can be proved for the YS design by an argument very similar to 
that of Example (3) just above, and which we therefore omit. One can even give 
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an example of the lack of optimality of the YS in Ag without resorting to this 
analysis: for the case ki = 2, ky = 3, u = 3, the usual YS gives no df. to error, 
while the design which chooses two treatments at random and assigns each 
treatment to three plots, at least once in each row and column (full symmetry is 
impossible here) is uniformly more powerful for all @ and all alternatives. 

(5) Other examples. Examples like those mentioned in Section 3C (3) can be 
considered similarly, with analogous results. In particular, a trivial example in 
the case of a higher LS has already been mentioned in the first paragraph of 
Section 1. 


5. Remarks and extensions. We list « few of the variants of the examples con- 
sidered in this paper for which similar results hold , and make a few comments on 
questions which arise in connection with the paper, some of which present un- 
answered research problems. 

1. A few of the other problems to which modifications of our method apply 
have been mentioned in Section 3C, and some of these will be considered else- 
where. Some such results hold under various non-normal probability laws (the 
point of the results of Section 4 is not merely that they hold for many models, 
but that they hold for the simplest, classical, normal model). Of course, a design 
which is optimum for one model may fail to be optimum for another, and vice 
versa; in particular, the results are obviously sensitive to change in the function 
y (even to changes to other quadratic forms and for a fixed d, as indicated in 
Section 2). Optimality criteria can be altered in other ways; e.g., one can con- 
sider M,-,--optimality, in imitation of 2A(c). The extent of completeness of non- 
optimality results like those on the higher LS design (first paragraph of Section 1) 
and YS design (Section 4B(4)) obviously depends on whether or not o is known. 
The results for Model II and certain mixed models of the analysis of variance 
differ considerably from those for the model considered herein, since the de- 
pendence of the power function on the design (and on the test, for a fixed design) 
is so different; however, similar methods can be used there. 

2. Besides changing the model, one can also change the decision space. From 
the examples cited just above regarding higher LS and YS designs, it is clear that 
nonoplimalily results for some classical symmetrical designs hold for many de- 
cision problems. For normal and certain nonparametric point estimation prob- 
lems, the discussion of [2] and [3] indicates why Section 3 yields optimality results 
(these actually hold for many weight functions other than squared error). Another 
typical estimation result is contained in the fact that the designs d* of Theorems 
3.1 and 3.2 maximize the trace of Vz’ and that V4 is a multiple of the identity; 
from these it follows at once that average variance of ta (= trace of 0’ Va/(u — 1)) 
is a minimum for d*, However, the results of Section 4 are meaningless for many 
common weight functions, since Vs is not the covariance matrix of b.lLe.’s. 
Similar results hold for some interval estimation problems; for estimating 
¥(u)/o (e.g., in “multiple comparison” problems), Section 4 is now sometimes 
relevant. Multiple classification and ranking problems can be treated in like 
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manner. Of course, a D-optimum design minimizes the approxiate generalized 
variance in point estimation problems. 

3. As we have mentioned, nonoptimality results like those of Section 4 do not 
depend on the nonrandomized design being symmetrical. Much more difficult is 
the problem of characterizing optimum designs in the sense of Section 3 when 
there is no appropriate symmetry. (Even the considerations of Sections 3B(2) 
and 4B(3 and 4) become messier without the restrictions on /;/u and k/u; it 
would be nice if neat proofs could be given in such cases.) It seems often to be 
true that a design which is “closest to being symmetrical” in an appropriate 
sense (e.g., note the dependence on >> n; in Theorem 4.21) is optimum, but the 
algebra involved in proving this can be tedious. Problems like that cited in the 
next to last paragraph of Section 3C(3) can be similarly unwieldy under heter- 
oscedasticity. In connection with a general symmetry-invariance approach like 
that mentioned below (1.3), we note that appropriate symmetry of X, is useful as 
a partial sufficient condition for some optimality results, but that appropriate 
symmetry of XX, is what is really relevant (for the functions ¥ we have con- 
sidered). 


4. We have mentioned in Section 2 some of the difficulties present in verifying 
M- (or sometimes L-) optimality. If ba is not a constant for d in A’, or if ran- 
domized designs are considered, this difficulty is increased by the nonconstancy 
of the d.f. for S, , ete. (We have not considered here a thorough investigation of 
the optimality properties of the procedures 6, of Section 4). The difficulty en- 
countered in connection with M-optimality in the nonconstancy of the power 


functions of competing tests on appropriate contours also manifests itself when 
one tries to find a most stringent design (the “envelope power function” being 
obtained by taking the supremum of 8, over all @ in Ha(@) and all d in A or 
Ar). The method of invariance used to prove 2A(f) cannot even supply a start 
here, and the method of [6| or [7] used to prove 2A(c) yields no analogue here 
where d is not fixed. Thus, even in such a simple example as that of Section 2B, 
the stringency problem seems extremely difficult. 

It is interesting to note that the 6, of Section 4 lack a “consistency” property if 
k(d) <r, in that a;,, (e) does not approach | as ¢ — % (in fact, it is easy to 
see that the uw for which one component of Ryu is o Wc and all others are 0 is 
asvmptotically worst on the contour ¥(u)/o° = ¢ asc — %, giving power ap- 
proaching [k(d) + (r — k(d))a\/r). Nevertheless, the question remains open as to 
whether any of these 6, , or some other design and associated test which lacks this 
consistency property, is nevertheless most stringent. 

The reader will not find it difficult in considerations like those of Section 3B to 
supply the details which show, in some problems, that the D-optimum (or L- 
or /-optimum) design is unique. When uniqueness is not present (e.g., for some 
a and e, both designs in Section 2B will be L-optimum), questions of global 
admissibility arise. A related problem is to look not at a fixed contour or family of 
contours in the manner of Section 2, but rather to characterize complete classes 


of designs in the manner of [3]; in such considerations, especially for problems of 
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testing hypotheses, Section 4 shows that results like those of [3] must be altered 
if Ap is considered rather than A. 

Finally, we may remark that, for a fixed d, the problem of characterizing an 
L,-optimum test is unsolved; the generalized Neyman-Pearson Lemma does not 


seem to vield explicit results easily, although it is not difficult to show that an 


L,-optimum test is obtained by replacing the numerator of the F-test by some 
other quadratic form. 
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DISTINGUISHABILITY OF SETS OF DISTRIBUTIONS 
(The case of independent and identically distributed chance variables) 
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1. Introduction. Suppose it is desired to make one of two decisions, d; and dy , 
on the basis of independent observations on a chance variable whose distribution 
F is known to belong to a set &. There are given two subsets G and 3 of F such 
that decision d;(d2) is strongly preferred if F is in G (3C). Then it is reasonable to 


look for a test (decision rule) which makes the probability of an erroneous de- 


cision small when F’ belongs to G or 3, and at the same time exercises some 
control over the number of observations required to reach a decision when I’ is 
in ¥ (not only in G or 3c). 

This paper is concerned with criteria that enable us to decide whether, for 
given sets , G, and 3c, there exists a test of the described type. More precisely, 
we shall consider several classes of tests, such as the class of all fixed sample 
size tests, or the class of all tests which terminate with probability one whenever 
F is in S. Thus the restriction to tests in one of these classes is equivalent to 
imposing some sort of control, of a purely qualitative nature, on the sample size. 
We then shall try to find necessary and/or sufficient conditions for the existence 
of a test in a given class which makes the maximum error probability in G u 5c 
less than any preassigned positive number. 

If such a test exists, we shall say that the sets G and 3€ are distinguishable” in 
the given class 3 of tests. If 3 is the class of all fixed sample size tests, the dis- 
tinguishability of G and 5¢ in 5 is equivalent to the existence of what has been 
ealled a uniformly consistent sequence of tests for testing Ff ¢ G against F e 3X. 

The sets G and 3¢ will be called indistinguishable in 3 if for any test in 3 the 
sum of the maximum error probability in G and the maximum error probability 
in 5 is at least one. (There always exists a trivial test for which this sum is equal 
to one.) In section 2 it will be shown that, with the present restriction to sequences 
of independent and identically distributed chance variables, two sets are either 
distinguishable or indistinguishable in any of the classes 3 which we shall con- 
sider. 
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4 In [3] the term distinguishable was used in another sense 
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Since we confine ourselves to tests based on a sequence X,, X2, --~- of inde- 
pendent, identically distributed chance variables, we may restrict ourselves to 
sequential tests. A sequential test is determined by the sample size function 
and the terminal decision function ¢, and will be denoted by (N, ¢). Here N is a 
chance variable whose values are the non-negative integers and + ©, and whose 
conditional distribution, given any sequence x = (x , #2, «++ ) of possible values 


of X,, Xo, +--+, is such that the probability of N <= n does not depend on 
Ins1, Lnao,°+:+, for all non-negative integers n. The function @ is a fune- 
tion of (x, , +++ , xv) whose values range from 0 to 1. The test (N, @) consists in 


taking one observation on each of the first N chance variables of the sequence, 
finding the corresponding value of @, and making decision d; or dz with respective 
probabilities | — @ and ¢. The function ¢ and the conditional distribution func- 
tion of N given x are always understood to be measurable on the appropriate 
a-field. The sample size function N and the terminal decision function @ are said 
to be non-randomized if the respective functions P[N S n | x| and $(x) take on 
the values 0 and 1 only. A test (N, ¢) will be called non-randomized if both NV 
and @ are non-randomized 

We use the term distribution synonymously with probability measure. The 
set F consists of distributions on a fixed o-field @ of subsets of a space 9. Unless 
we state otherwise, we shall assume that % is a k-dimensional Euclidean space 
and @ the k-dimensional Borel field. A distribution on @ will then be called a 
k-dimensional or /-variate distribution. If F is a distribution on @, we denote 


by F{A] the probability of the set A ¢ @ and by F(x) = F(a”, --- , 2°), re, 
the associated distribution function, that is, P(r) = Fifty a Ee ogee yes S 
x }). With the usual definition (see [5])* of the dis tribution of a sequence X = 

(X,, No, ---) of independent chance variables with identical marginal dis- 


tribution /, we denote by P,{B] the probability of a measurable set B in the 
range of X, and write Hpy for the expected value of a function y of X. 
According to our definitions, the a of an erroneous decision when 
test (NV, ¢) is used is equal to Prd if F ¢ G, and to E,(1 — ¢) if F « #. Thus the 
sets G and 3 are distinguishable in a ¢ - 3 of tests if and only if oe every « > 0 
there exists a test (NV, ) in 3 such that Epd < ¢ for F eG and Ep(l — $) < « 
for all F . 
2. Modes of distinguishability. We shall be concerned with the distinguish 
ability of two sets of distributions in various classes 3 of tests, which are defined 
in terms of properties of the distribution of the sample size function N. Some 
classes of particular interest are the following. 


%: PHN < ol] = ltr es 
ai re EyN' < ~if Fes(r > 0). 
:EpN’ < ~ forallr > Oif Fe 
- Eye’ < & forsomet = t(F) > Oif Fé 
33: max(N) < o. 





‘The numbers in square brackets refer to the bibliography listed at the end. 
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It will be noted that each of the successive classes contains the one following. 
Some classes of obvious interest have been omitted because, for the purposes of 
our investigation, they are equivalent to some of the classes listed above. Thus 
if two sets are distinguishable in one of the classes 3), --- , 33, they are also 
distinguishable in the corresponding subclass which contains only the non-ran- 
domized tests; this follows from Theorem 2.1 below. If two sets are distinguish- 
able in 3; (the class of “truncated”’ sequential tests), they are clearly distinguish- 
able in the class of all fixed sample size tests; for if (N, @) is any test in 3; , and 
we put N’ = max (N),¢’ = El@ | x], then (N’, 9’) is a fixed sample size test such 
that Epo’ = Fpd for all F. 

In view of the importance of the two extreme classes, 39 and 33, we shall use 
the following terms. If two sets of distributions are distinguishable (indistin- 
guishable) in 3), they will be called distinguishable (%)(indistinguishable (%)). 
If two sets are distinguishable (indistinguishable) in 5, , we shall say that they 
are finitely distinguishable (finitely indistinguishable). 

The classes 5; have been defined in terms of the set $ to which the distribution 
of X, is assumed to belong (without displaying J in the notation). It may be of 
interest to consider also the corresponding classes where 5 is replaced by some 
subset of F¥ (compare Lemma 4.1 in section 4). It will be clear that Theorems 2.1 
and 4.1 below can be immediately extended to such classes. 

Our list does not contain the subclass of 3,(r) where EN’ is bounded for 
F ¢« §, nor the subclass of 39 where Pp|N > n| — 0 as n — &, uniformly for 
F ¢ §. The reason for this omission is that two sets G and 3C which are distin- 
guishable in one of these classes are finitely distinguishable. This follows from 
the following fact: If (NV, @) is a test such that P»{[N > n] ~ 0 asn—- ~, uni- 
formly for F ¢ Gu 3, then for every « > 0 there exists a test (N’, ¢’) such that 
max (N’) < ~ and | Erg’ — Erd| < eforall Fe Gu &. This is so since, by our 
assumption, we can choose an integer n = n(e) such that Pr[N > n| < 2 «€ for 
all F e Gu 3, and the test (N’, ¢’) defined by 


¢' = ¢, N=NifN Sn; ¢’ = }, No=nifN>n 


has the stated property. 

Let 3 be any class of tests. If @ = (3) denotes the class of all terminal de- 
cision functions ¢ of the tests in 5, the statement that G and 3 are distinguishable 
in 3 can be expressed by the equation 


sup int (hyd — Ego) = 1. 


a GeGQ Hey 


Whenever 5 contains a trivial test such that ¢ = const, the left side of (2.1) is 
at least zero. Let us say that a test in 3 is nontrivial for distinguishing between 
G and # if supere Pad < infurze End. Thus the left side of (2.1) is positive if and 
only if 5 contains a nontrivial test for distinguishing between G and %. The fol- 
lowing theorem shows that if 5 is one of the classes 39, --- , 33 (or one of the 
“equivalent”? classes mentioned above), then the existence in 3 of a nontrivial 
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test for distinguishing between G and 3 is sufficient for G and 3 to be distin- 
guishable in 3, and even in the class 3’ which consists of the non-randomized tests 
in 3. The special case of the theorem where 3 is the class of all non-randomized 
fixed sample size tests is contained in a lemma of Berger [{1| (which is there at- 
tributed to Bernoulli). 

We denote by # and ®’ the classes of the terminal decision functions of the 
tests in 3 and 3’, respectively. 


THEOREM 2.1. 17 5 is one of the classes 3o , +++ , 33, then 
(22 sup inf (Ea od = Eg d) > 0 
oe GeG.HeK 


im pl 1e8 







(2.3) sup inf (Lune — Ego) = 1 


ocd’ GeG ex 





(2.4) sup inf (kad — Eco) = Oorl. 


oct GeG. Hey 













For the proof of Theorem 2.1 we require the following 
Lemma. If 3 is one of the classes 39, -++ , 33, and (N, ¢) is in 3, then for every 
¢ > O there is a test (N’, ’) in 3 such that N’ is non-randomized and | ¢' — | < « 
Proor. Let N’ be the least integer n > 1 such that 









P(N > n|x] < «. 









Define ¢’ by 





¢’ = Elo | N S n,x| if N’ = n, n=1,2 


Thus (N’, 0’) is a test, and N’ is non-randomized. 
We have for every n 2 1 


PIN’ > n] = P{PIN > n!x] = ef S €'EPIN > n'xl 







: ; 
= PIN >n 
Since for any increasing function A on the nonnegative integers 


ENR(N) = h(O) + ae {h(n + 1) — h(m)|PIN > ni, 


n 


it follows that if N satisfies the condition for any of the classes 3), «+> ,3,;,s0 
does N’. Hence (N’, ¢’) is in 3. 
Now if N’ = n, we have from the definition of @’ 














o—¢ = P(N s n\|xlElo|N S n,x| + PIN > n|x|Elo|N > n,x| -— 9’ 









= PIN > n|x\l(Elo|N > n, x] — @’). 





Thus |@ — ¢’! S PIN > n|x] if N’ =n. But N’ =n implies PIN > n\|xl < 
«, for all n. This completes the proof of the lemma. 
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Proor oF THEOREM 2.1. If condition (2.2) is satisfied, 3 contains a test (NV, ) 
such that 
sup Kad < int Kuo = £. 


GeG Hes 
3v the preceding lemma we may and shall assume that N is non-randomized. 
Let « be any positive number. The theorem will be proved by showing that 


there is a non-randomized test (N’, $’) in 3 such that 


(2.5) int yg’ — sup keg’ > 1 — «. 


Hex GeS 


Choose a positive integer m which satisfies the inequality 


Ss ¥t 2 
B—a/ m a. 


Define the test (NV’, ¢’) as follows. First apply test (NV, @), and denote the result 
ing values of N and @ by N,; and ¢ . 
pendent sequence of observations and note the values N2 and @2 of N and @¢. 
Continue in this way until m independent sequences of observations have been 
taken. The total sample size is N’ = N, + --- + N,,.Since N is non-randomized, 


Then apply the same test to a new inde- 


so is N’. Now put 


m 


l 


a Qi; 
Mm jal 


>= 


Thus (N’, ¢’) is a non-randomized test. 
The chance variables ¢; , --- , @m are independent, and each has the same 
distribution as ¢. Hence E¢ = 
If Gec, then Ecd = a, so that 


E ¢, and the variance of @ is less than 1/m 


Egd' = Pe E —- Egd > : = 


or ’ - 6 — 
< Pe E - Eeé > — > 


4 2 ¥ 
1G?) 


by Chebyshev’s inequality. Hence 
€ 


sup Egg’ <5. 


GeG 
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In a similar way it is seen that 


€ 
sup Hy(l — ¢’) < =, 
a * 2 
so that inequality (2.5) is satisfied. 
We now show that the test (N’,¢’) isin 3. For3 = 39 and 3 = 3; this is obvious. 
Since for r > 0, 


m r ™ 
(N’)" = (= .) <(m max N,)’ = m’ max (Ni) Sm’ DON; 
im] i=ml,---,m teml,---,m i=l 


and each N; has the same distribution as N, we have E(N’)’ < « whenever 


EN’ < «x. This proves the statement for 3 = 3,(r) and 3 = 5,. 
Finally, if Ee < «,wheret > 0, put t’ = t/m. Since N,, --- , N, are inde- 
pendent and distributed as N, He'* = Ee < o. 
Thus (N’, ¢’) is in 3 in every case. The proof is complete. 
It should be noted that if X,, X»2,--- are not independent and identically 
I : 


distributed, the analog of Theorem 2.1 is not true in general. 


3. Sufficient conditions for distinguishability. Let K be a set of distributions 
on @. A distance in & is a nonnegative function 6 of the pairs (G, H) of distribu- 
tions in ® such that 6(@, G) = 0, 6(G, H) = 6(H, G), and 6(G, H) S 8(G, K) + 
6(H, K), for all G, H, and K in &. (We do not require that 6(G, H) = 0 imply 
G = H.) We write 6(G, 3) for infrese 6(G, H), and 6(G, 3C) for infecg 6(G, 3). 

Let F,, denote the empiric distribution of the first n members, X,,--- , X, 
of a sequence of independent chance variables with the common distribution 
F « &; that is, nF,[A] is the number of indices 7 S n for which X; ¢ A. We assume 
throughout that the set ® in which a distance 6 is defined contains ¥ and all 
empiric distributions. 

We shall say that a distance 6 is consistent in $ if for every « > 0 
(3.1 lim P,[6(F,, F) > ed = 0 
whenever Ff ¢ §. The distance 6 will be called uniformly consistent in & if the con- 
vergence in (3.1) is uniform for F e §. 

In this section we derive sufficient conditions for distinguishability in terms of 
uniformly consistent distances. We first mention a few examples of such dis- 
tances. If F is the set of all distributions on the k-dimensional Borel field @, and 
x denotes the k-dimensional Euclidean space, the distance 


(3.2) D(G, H) = sup | G(x) — H(z) 
ze 


is known to be uniformly consistent in F (see, for example, [4]). So is the distance 


l/r 
(/ Gla) — H(z) ‘dK) 
x 
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where r 2 1 and K is a fixed distribution on @, since it is bounded above by 
D(G, H). A further example of a uniformly consistent distance is 


D.(G, H) DIG. , Ha); 


where G,, and H,, are the distributions, according to G and H, of a fixed, real- or 
vector-valued measurable function w on X. If u(F) denotes the mean of a one- 
dimensional distribution /’, the distance | u(G@) — u(H) | is uniformly consistent 
in any Class of distributions with bounded variances. 

A sufficient condition for finite distinguishability is the following. If the dis- 
tance 6 is uniformly consistent in CG u 3 and 


3.4) 6(S, K) > 0, 


then the sets G and 3€ are finitely distinguishable. 

This can be seen by using the test with N = n fixed and@ = 1 or O according 
as 6(F,, S) — 6(F,, RH) 2 O or < 0. If F €G, then 6(F,, S) S 6(F., F) and 
O(F,, H) 2 a(F, HR) — 6(F,, F) = 6(G, KR) — OF, , F). Hence Ero does not 
exceed 

sup P,{6(F,, F) 

FeGu3 
We obtain the same upper bound for Ey(1 — ¢), F ¢ 5. Our assumptions imply 
that the bound tends to 0 asn — =. 


In the proof of the next theorem we shall make use of a test defined as follows. 


Let 6 be a distance, {e,j, 7 = 1, 2, +--+, a sequence of positive numbers, and 
{ ; 1 ? 


(mit, ? = 1,2, +++ , an increasing sequence of positive integers. Put 


’ 


6; = max [86(F,, , S), 6(F a, , %)). 


Take successive independent samples of sizes nj, m2 — my, N3 — Me, +++ . Con- 
tinue sampling as long as 6; < c,;. Stop sampling as soon as 6; 2 c; , and apply 
the terminal decision function 


1 if 6(F,, ,S) 2 6(F a; , H) 
0 if &(F,, ,S) < &F.; , X). 


Thus N = n;, where 7 is the least integer for which 6; = ¢;. 

We shall refer to this test as the test 7(6, {e:}, {ni}). 

THEOREM 3.1. (a) If the distance 6 is uniformly consistent in F, then any two 
subsets C and 3 of ¥ for which 


(3.6) max [6(F, S), 6(F, #)| > Of F eS 


are distinguishable (5). 


(b) If, for every ¢ > 0, there exist two positive numbers A(c) and B(c) such that 
for all integers n > O and all F eS 


(3 


‘ 


) P,{6(F, , F) = ec] < Alcde 
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then any two subsets G and 3 of ¥ which satisfy (3.6) are distinguishable in the 
class of tests (N, ) such that E pe Ww for somet = UF) > Of Fes. 

Proor. Let a be a positive number. Part (a) will be proved by showing that 
the sequences {c,} and {n;} can be so chosen that the test (V,¢) = T(6, te,| 
{n;}) satisfies the conditions 








(3.8) Eg Saif FG, E-x(li —¢) SaifFeX 











(3.9) PN < @) = lif Fes. 






Let {c,} be a sequence of positive numbers such that 


(3.10) lim c; = 0. 


t~2 







Choose the positive numbers a; , a2, «++ so that 






~~” 


(3.11) > a S a. 


t=1 











Since 6 is uniformly consistent in ¥, we can choose the integers nj <n. < 
in such a way that 





3.12 PAKF.,,") 2c] Sa, += 1,2 







for all F ¢ &. 
If F e< 


Ero = > P,lb, < c, 





It now follows from (3.12) and (3.11) that Ery@ S aif F ¢€ GS. In a similar 
way It is seen that E,(1 — ¢) S a@ if F ¢ KX. Thus the conditions (3.8) are 
satisfied. 







The terminal sample size NV takes on the values m,, n2---, and we have 


J 


P,|N > n,| - P, (6s; < Ci, d = Ll, ree yj] s P, (5, < C)]. 














By the triangle inequality, 
(F.,, ¥), 
where 


max [6(F, G), 6(F, 3))}. 
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By assumption, 6* > 0 for all F eS. 
Hence if F ¢§, 


(3.13) P;[(N > n;| S Prli(F,, , F) > 8* — c;). 
Since c; > 0, we have 6* — c; > c; for j sufficiently large, and then the rigbt side 
of (3.13) is Sa;. By (3.11), a; ~O0asj — «. Thus Ps[N > nj) ~0 asj - @, 
which implies (3.9). This completes the proof of part (a). 
Now suppose that the assumption of part (b) is satisfied. The sequences {¢,! 
f ’ 


and {n;} can be so chosen that, in addition to lim ¢; = 0 and n 


ates 


(3.14) lim inf 7'(2n; — nisy) > O 


in~x 


and 
(3.15) 


(For instance, put M (c) max [A(c), 1/B(c)|; choose ¢ , c2, +--+ so that ¢; > 0, 


. 1/2 - . - 
lim ¢ O and M(c;) S mi *,i = 1,2, --- , with a suitable number m > 0; and 


>) 


put n; = ni, where n is so large that 


nm 
1/2 nm ji/2 
mere 


i=1 


The inequalities (3.7) and (3.15) imply that conditions (8.11) and (3.12) are 
fulfilled. Hence the conditions (3.8) are satisfied. 
For a fixed F ¢ F, choose the integer h so that c; < 6*/2 for 7 > h. Then for 


t > h, due to (3.13) and (3.7), 
P,[N > nil S Prli(Fn, , F) > 6*/2] S ae 


where a = A(6*/2) and b = B(6*/2) are positive numbers. 
Now for any real ¢ 


> Ny 


Thus Ere’ < x if the series 


‘onverges. 


<o that the series converges due to (3.14). The proof is complete. 


The assumption of Theorem 3.1, part (b) is satisfied if F is any set of /'-dimen- 
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sional distributions (* = 1) and 6 = D, the distance defined by (3.2). This is 
implied by the following theorem due to Kiefer and one of the authors [4]: For 
every integer / = 1 there exist two positive numbers a and 6 such that for all 
ce > 0, all integers n > 0, and all k-dimensional distributions F 



















P,(D(F,, F) 2 cl] S a ' - 


(3.16) 


(For k 1 the inequality (3.16), with b 2, was proved by Dvoretzky, Kiefer 
and one of the authors [2].) Hence we can state the following corollary. 
Coro.uary 3.1. If ¥ is any set of k-dimensional distributions (k = 1), then any 


two subsets G and % of & for which 





max [D(F,¢), D(F, %)| > Of Fes 


are distinguishable in the class of tests (N,@) such that Eye < & for somet = t(F) 


> O07 F es. 
























4. Necessary conditions for distinguishability. Let P and Q be two distribu- 
tions on a o-field @ of subsets of an arbitrary space Y, and let W be the class of 
all measurable functions on Y with values ranging from 0 to 1. We denote by d 
the distance defined by 


(4.1) d(P, Q) = sup | Epy — Egy. 
vev 


We note some alternative expressions for d. Let v be any o-finite measure with 
respect to which P and Q are absolutely continuous (for instance, » = P + Q), 
and denote by p and gq densities (Radon-Nikodym derivatives) of P and Q with 
respect to v. Then 


(4.2) ad(P,Q) = / 


“(| P>@ 


, l : 
(p — 4) dv => p —q\dv = 1 — | min (p,q) dv. 
(Here and in what follows, an integral whose domain of integration is not indi- 
cated is extended over the entire space.) Also 


(4.3) d(P, Q) = sup | PIB) — Q{B)\. 


Be® 





or any distribution G on @ we denote by G” the distribution of n independent 
chance variables each of which has the distribution G. We write ¢” for the set 
of all G'” such that Ge ¢. 

It is easily seen from (4.1) that 


(4.4) d(G, H) s dG, H™) s a(G"**”, H"*”), n=1,2,--- 


and from the last expression in (4.2), using the inequality min (ab, cd) 2 min 
(a, c) min (b, d), where a, b, c, d are all positive, that 


(4.5) aig”, H”) <1 — (1 — d(G, H))" s n dG, BA). 









(See also Kruskal [6], p. 29.) 


The convex hull, C@, of a set & of distributions on a common o-field is 





defined 
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as the set of all distributions \,P; + --- + A,P,, where r is any positive integer, 
P,,-::,P,arein @, and \y, --- , A, are positive numbers whose sum is 1. 
In order that two sets G and & be finitely distinguishable it is necessary that 


(4.6) d(cs™, cx) > 0 
for some n or, equivalently, 


(4.7) lim a(cg™, CH) = 1. 
This is known and follows easily from the definition (4.1) and Theorem 2.1. 

If the set G u 5 is dominated, that is to say, if the distributions in G u H are 
absolutely continuous with respect to a fixed o-finite measure, then condition 
(4.7) is also sufficient for G and % to be finitely distinguishable. This is contained 
in Theorem 6 of Kraft [7] and follows from a theorem of LeCam (Theorem 5 of 
Kraft [7]) which is equivalent to the statement that if the set @; U P, is dominated, 
then 
(4.8) max int (Ep,@ — Ep, o) = d(CW,, CP»), 

eed Pye. P2ePe2 i 
where ® denotes the set of all measurable functions @ such that 0 S @ = 1. 
If condition (4.6) is satisfied, then 


(4.9) d(S, K) > 0. 


In fact, d(Cg’"’, Cx") Ss d(g'”, KH”) S nd(G, HX) by (4.5). This weaker but 
much simpler necessary condition for finite distinguishability will be shown in 
section 5 to be also sufficient under certain assumptions. 

To obtain necessary conditions for non-finite distinguishability we first prove 
the following lemma. 

Lemma 4.1. /f 


(4.10) d(Fo", CS”) = d(Fo”, Cx”) = 0 


for all n, then the sets G and & are indistinguishable in the class of tests (N, ) with 
Pr, |N < o] i. 

Proor. Let (NV, ¢) be any test such that Pp,[V < «| 1. Define ¢, = @ if 
N Sn, ¢n = Oif N > n. Thus @, is a function of «1 , --- , c, only, and ¢, S ¢. 
Let K be a member of CG’”’, so that K = \yG,”"’ + --- + G2", Gi eS, i > 
0, Z\; = 1. Then’ 


Exdn = Di EG, on S DiKe,o S sup Keo. 


Ger 


Hence 


Ey, é, — sup Ecd S Ev, od, — Exod, S d(Fo”", K) 


5Here Exod, denotes the expected value of @, when the joint distribution of 
(X,,--: , X,) is K. We keep the notation Egd, when X,,--- , X, are independent and 
each X; has the distribution G 
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for all K ¢ CG’"’. Therefore 


¢ 


Ep.on — sup Ege S d(Fy",CS™”) = 0. 


Since Pp,(N > n) > Oasn— ~, Epo, converges to Ey,d. Hence 


(4.11) Ey, S sup Led. 


GeG 
In a similar way, if we use, instead of ¢, , the function < =¢gifN Sn, bn = 
lif N > n, we find that 
(4.12) Ey,@ 2 inf Eng. 
Hex 
Inequalities (4.11) and (4.12) imply the Lemma. 


THEOREM 4.1. Jn order that the sets G and 5 be distinguishable (SF) it is necessary 
thal 


(4.13) max [d(F'", Co"), d(F'”, Cx'")| > 0 


J 
for some n if F eS and hence that 
(4.14) max [d(F, SG), d(F, %)| > O7f F eS. 

Proor. The necessity of (4.14) follows immediately from Lemma 4.1. That 
(4.13) implies (4.14) follows from inequality (4.5). 

That the condition (4.13) can be violated when inequality (4.14) is satisfied 
can be seen from an example given by Kraft ({7|, p. 132) to show the non-equiva- 
lence of two conditions equivalent to (4.6) and (4.9). Nevertheless the simple 
necessary condition (4.14) is also sufficient under certain restrictions on the set 
of distributions, as will be seen in section 5. 

We conclude this section by showing that a known necessary condition for dis- 
tinguishability is implied by condition (4.14) of Theorem 4.1. 

For any two distributions F and G on @ and any set G of distributions on @ 
define 


(F,G) = [ ft08 (f/g) dv, __r(F,9) = inf +(F, G), 


GeG 
where »v denotes a o-finite measure with respect to which F and G are absolutely 
continuous, with densities f and g. Note that 0 S 7 (F,G) S o. It has been 
shown in [3] that if 7(F,G) = 0, then F and G are indistinguishable in the class 
of tests with EyN < 2x. Now 


—47(F,G) = [i log (g/f)'* dv 


= log [ sa/f)"* dy = log fg)'” dy 


and (see Kraft [7], Lemma 1) 


d(F,G) <1 —--: / (fg)'* dv>. 
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Hence r(F, SG) = 0 implies d(F, S) = 0. Thus, by Lemma 4.1 (with Fo = F and 
5 consisting only of F) F and G are even indistinguishable in the class of tests 
with Pp[N < «] = 1. It is easy to construct examples where d(F, SG) = 0 and 
7(F,S) > 0, so that condition (4.14) is actually better than the corresponding 
condition with d replaced by r. 


5. Necessary and sufficient conditions for distinguishability. In this section 
we shall show that the necessary conditions of section 4 are also sufficient for 
distinguishability under certain restrictions on the sets of distributions. Most of 
our results will be such that if the necessary condition is satisfied, the sets are not 
only distinguishable (5), but even distinguishable in a stronger sense. 

If S consists of a single distribution G, then, by Theorem 4.1, G and % are 
distinguishable (¢ u 3C) only if d(@‘”, Cac”) > 0 for some n. If 3¢ is dominated, 
this condition is sufficient for G and 3¢ to be finitely distinguishable, by the Le 
Cam-Kraft theorem mentioned in section 4. More generally, we can state the 
following. 

If G is finite and 3¢ is dominated, then G and & are either finitely distinguish- 
able or are indistinguishable (G u 3C), depending on whether the condition 


(5.1) max [d(F'”, Cc”), d(F™, Cx )| > 0 


for some n if F ¢ Gu & is or is not satisfied. Condition (5.1) is equivalent to 


(5.2) aa”, cx) > 0 
for some n if G eG. 

That condition (5.1) is necessary for G and 5 to be distinguishable (G u 3) 
follows from Theorem 4.1. On the other hand, if (5.1) is satisfied, so is (5.2). 
Hence if the distributions in G are denoted by G,,--- , G,, then, by Le Cam’s 
theorem, G; and 5 are finitely distinguishable, for each 7. Thus, given « > 0, 
there exists an integer n and tests (n, ¢;) such that 2¢.¢; < € and 


suprex En(1 — $i) KC €,t2 = 1,°°-, 7. 


Put @ = digo --: d. Thend S ¢ andl — ¢~ 8 ~ 1 (1 — ¢,). Hence 
Eo@ < efor alli and Ey (1 — ¢) < reif H ¢« X%. Therefore G and & are 
finitely distinguishable, and condition (5.2) is equivalent to (5.1). 

If both G and % are countably infinite sets, it is no longer true that G and 3c 
are either finitely distinguishable or indistinguishable. To see this, let G = 
{G;} and 30 = {H;},7 = 1,2, --- , where G; and H; are univariate normal dis- 
tributions with respective means a and b (a ¥ b) and common variance o; , such 
that lim oj = ~. It follows from a result of Stein [8] (or from Theorem 3.1) that 
G and 5X are distinguishable (91), where 3 denotes the class of univariate normal 
distributions. But one readily verifies that lim; d(G; , H;) = 0, so that the sets 
are finitely indistinguishable. 

In what follows it will be shown that the simple condition 


(5.3) max [d(F, S), d(F, #)] > 0Oif Fes 


which, by Theorem 4.1, is necessary for G and 3 to be distinguishable (%), is also 
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sufficient under rather general assumptions. Under somewhat more stringent 
assumptions the necessary condition d(G, %) > O for finite distinguishability 
(see (4.9)) will also be shown to be sufficient. 

A comparison of the results of sections 3 and 4 shows that if 6 is any uniformly 
consistent distance in a set ¥, then d(G, 52) = O implies 6(G, 30) = O whenever 
GC and XC §. Theorems 3.1 and 4.1 also show that if the set F has the prop- 
erty that there exists a uniformly consistent 6 such that, for all F ¢ F and all 
G C §&, 6(F, S) = 0 implies (and hence is equivalent to) d(F, SG) = 0, then the 
necessary condition (4.14) is also sufficient for two subsets G and 3 of F to be 
distinguishable (%). Similarly, if for all G C § and all 5 C &, 6(G, 32) = O implies 
d(G, 5K) 0, then any two subsets G and 3 of F are finitely distinguishable if 
and only if d(G, 32) > 0. 

We first consider conditions which ensure that D(/’,G) = 0 implies d(F’,S) = 0. 
Let F and G be two k-dimensional distributions and ¢ a nonnegative number. 
Suppose that there is an integer J with the following property. There exist J 
non-overlapping /-dimensional intervals J; , --- , 7; such that (i) Ff — Gis mono- 
tone’ in each I; , and (ii) if V denotes the complement of Us_, 7; , then min 
(F(V], G[V]) = e. Write J(F, G; e) for the least integer J having this property. 
If such a finite J does not exist, define J(’,G; 6) = &. 

Note that if F — G is monotone in a set C, the difference of the densities, 
f — g, is of constant sign in C except in a subset of probability 0 according to 
both F and G. 

Lemma 5.1. Jf F and G are two k-dimensional distributions, 


(5.4) d(F,G) < 2°J(F, G; &) D(F, G) + «. 


' 


Proor. We may assume that J = J(F, G; e) is finite. Then there exist J non- 
overlapping intervals J; ,--- , 7, which satisfy the conditions (i) and (ii). We 
have ; 


J o 
2W(F,G) => / f-—q\d + | f—g)\ dp. 
" 


peel 7; 


[ f g\dv s [ (f + q) dv = 2 [ fdv + [ (g — f) dv 


\ 


= 2F(V] + > | ((—-g a s2rvle>d / S-a& 
Tj Tj 


j=l j=l 


f-g\d= i (f—g) dv < 2D (F,G). 
al Tj 


® An additive function L on ® is monotone in a set C is either L S{A] L [B] whenever 
1c BCCorL{A| = LIB) whenever A CBCC 
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Hence 
2d(F,G) < 2J-2‘D(F, G) + 2F{V}. 
By symmetry, the term 2F[V] can be replaced by 2G[V], and hence also by 
2e. This implies (5.4). 
THEOREM 5.1. Let 5 be a set of k-dimensional distributions, k 2 1. (a) If 


(5.5) sup J(F,G; 0) < « 


for all F ¢ § and all « > 0, then two subsets G and K of & are distinguishable (5) 
if and only af 
(5.6) max [d(F, S), d(F, 5)] > 0 


for all F ¢ §. Moreover, if condition (5.6) is satisfied, then G and 3 are distinguish- 
able in the class of tests (N,@) such that Epe'’ < « for somet = UF) > O7f F «5. 
(b) If 


(5.7) sup J(F,G;«) < = 


Fey,Ges 


for all e€ > 0, then two subsets G and & of F are finitely distinguishable if and only if 


(5.8) d(G, #) > 0. 


Proor. The necessity of conditions (5.6) and (5.8) has been proved in section 4. 
If condition (5.5) is satisfied, then, by Lemma 5.1, D(F’, G) = 0 implies d(F, G) - 
0 for all F ¢ F and all G C &. Hence if (5.6) is satisfied, the assumption of Corol- 
lary 3.1 is fulfilled, which implies part (a). The proof of part (b) is similar, refer- 
ring to (3.4) with 6 = D. 

The assumption of Theorem 5.1, part (b) (and hence that of part (a)) is satis- 
fied for most parametric sets of univariate distributions which are commonly 
used as models in statistics. In such sets ¥ the minimum number of intervals in 
which f — g is of constant sign is usually bounded, and then even supr,3,6.3 
J(F, G; 0) is finite. For example, if F and G are any two univariate normal dis- 
tributions, then J(F,G;0) < 3. This is also true if the singular normal distribu- 
tions (with zero variance) are included. 

The assumption of part (a) is satisfied if F is any subelass of the class of all 
distributions on the subsets of a fixed countable set S. Since the points of S can 
be arranged in a sequence, we may assume that S is the set of the positive in 
tegers. If F ¢F and e > 0, choose the integer M so that Fla > AT} < e. Since 
we can choose M intervals each of which contains exactly one positive integer 
<M, we have J(F,G; «) < M forall G ¢ 5, so that condition (5.5) is satisfied. 

Actual statistical observations are either integer-valued or integer multiples 
of a fixed unit of measurement. In this sense it can be said that the assumption 
of part (a) is satisfied for all classes of distributions which actually oeeur in 
statistics. 

If G and & are two arbitrary sets of distributions over a fixed countable set, 
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then G and & can be finitely indistinguishable even when d(G, 3%) > 0. This 
is shown by the following example. For r = 1, 2,--- and k = 1, --- , r define 
the sets 


Aen = {(j2""™ + 1): 


Let G, and H,,. be the discrete distributions whose elementary probability func- 
tions are 


g(x) = 2°"x(zx; Ap), he x(x) = 27°" x(x; Are), 


where x(x; A) = 1 or0 according as x ¢ A orxze A. LetS = {G,},r = 1,2 


and 3 {Hea}, k = 1,--+,r;r = 1,2, -+-- . The reader can verify that 


d(G, , H,«) S 


for all r, s, and k, so that d(G, #) > 0. 

Now denote by G," and H,;’ the distributions of n independent chance vari- 
ables each of which has the distribution G, and H,, , respectively, and by g-’ 
and hk,’ their elementary probability functions. Let H;" denote the distribution 
in Ci" whose elementary probability function is 


h,” =f a hei 
kel 
Writing g,", g, , ete. for the chance variables g,"" (X,, --- , Xn), gr(X1), ete., 
and FE for the expected value when the distribution of X, is G, , we have 


2 d(G\”, HS”) = EB} (AS /g”) — 11 S (BUCA /g”) — 17)" 


We calculate 


Eth.” /g,) =r Zz > (E(h,; hiu/g))” = 1+ 
jal kml 

It follows that lim,.. d(G@."', H,"’) = 0 for every n. Therefore d(g°"’, C3#’"’) = 0 
for all n, so that the sets G and %& are finitely indistinguishable. Note, however, 
that since d(S, H) > O, the sets are distinguishable in the sense of part (a) of 
Theorem 5.1 with § = G u 5 and, more generally, with F denoting any class of 
distributions on the subsets of U%_, A, such that condition (5.6) is satisfied. 

We shall see that all conclusions of Theorem 5.1 are true also for arbitrary sets 
of k-dimensional normal distributions, for any k 2 1. However, for k > 1 this 
cannot be deduced from Theorem 5.1 since the multidimensional D distance does 
not have the properties required by the theorem. It can be shown that if 
is any set of non-singular bivariate normal distributions, the assumption of part 
(a) is satisfied. But for arbitrary sets of bivariate (possibly singular) normal 
distributions, D(F, G) = 0 does not imply d(F, G) = 0. (For instance, if F 
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denotes the bivariate normal distribution with means (c, —c), unit variances 
and correlation coefficient 1, and G = {F,|e¢ > 0}, then D(Fo, S) 0 but 
d(Fy, S) = 1.) Moreover, D(G, 3) 0 does not imply d(G, 5) 0 even for 
sets of non-singular bivariate normal distributions. (Thus if G,. denotes the 


bivariate normal distribution with means (ce, —c), unit variances, and 


’ cor 
1 


relation coefficient (1 + ¢°)“, if © iG.|c < 0} and # {G.|e > 0}, then 
D(S, H) = 0 but d(SG, KH) > 0.) 

For a fixed k => 1 let 9 denote the set of all k-dimensional normal distributions. 
To prove the statement at the beginning of the preceding paragraph it is suffh- 
cient to display a distance 6 such that 6(G, 3C) = 0 implies d(G, 5C) = 0 whenever 
GCMand KX C HR, and 6 satisfies assumption (3.7) of Theorem 3.1 with 5 MN. 
We shall show this to be true for the distance 6* defined as follows. 

For any k-dimensional distribution F with finite moments of the second order 
define 0(F) = (u(F), 2(/)), where u(F’) denotes the vector of the means and 
>(F) the covariance matrix of F. Denote by 6 the range of 6(/). Define the fune- 
tion d*(0;, 9), 0, €9 by 


d*(6, , 0.) = d(F,, FPF.) if Fi; eM and O6(F;) = 0, 
Now define 6* by 
6*(F, , Fo) = d*(0(F;), 0(F2)) 


for any two k-dimensional distributions /; and F, with finite moments of the 
second order. 

The function 6* is a distance’ in the set of distributions for which it is defined. 
Obviously 6*(G, 3) = Oif and only if d(G, 3) = OforG C Nand KC MN. 

Now let F,, be the empiric distribution of n independent chance variables X, . 

- , X,, each of which has the distribution F ¢ ©. Put 6(F) 6 u, >) and 
6(F,,) = 6 = (4, &). Thus 4 is the sample mean vector and $ the sample covari- 
ance matrix. We have 


5*(F,, F) = d*(6, 6). 


It follows from the definition of d* that the distribution of d*(6, @) does not 
change if each X; is subjected to the same non-singular linear transformation. 
Hence the distribution of d*(6, 6) depends only on the rank r of 3. If r = hk, we 
may assume that @ = (0, /) = @ (say), where 0 denotes the zero vector with 4 
components and J the k X k unit matrix. If 1 < r < k, the distribution of d* 
(6, @) is the same, only with / replaced by r. If r = 0, d*(6, 6) = 0 with probability 
one. Thus we may confine ourselves to the case r = k, @ = 0). We have only to 
show that for every c > 0 there exist numbers A(c) and B(c) such that for all 
integers n > 0, 


(5.9) P{d*(6, 6) > c] < A(cje*™. 


7 Recall that 6*(F, , F2) = 0 need not imply Fi = F:2. 
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Now the function d*(@, 6) is continuous at 6 = 6 in the usual sense. Hence it 
is easily seen that (5.9) is satisfied if for every « > 0 the probability of each of 
the inequalities 


| fy a @ | as ont ] | 7 €, | pis ' > €; 
i#G4G= 1-44, 
a A lA A 1/2 A A A A 
where {j; 6:; (6: 6;;) , and f; and 6;; are the components of f~ and 2, does 


not exceed a bound of the form A(e) exp (—B(e)n) with B(e) > 0. That the 
latter is true is seen by considering the well-known distributions of 4; , ¢;; , and 
é:;. This completes the proof. 

In the proof we could have equally well used, instead of d, the distance 


( 
d, (F,G) = {| (fi? - qi)? as? i Query a o(F,G)}"" 
\ ) 


where v denotes a measure such that F and G have densities, f and g, with respect 
to v, and 


o(F,G) = | (fg)"? dv. 


For we have (see, for instance, Kraft [7], Lemma 1) 


1 — p(F, G) < d(F,G) = (1 — pF, G))", 


so that the distances d and d, are equivalent for our purposes. 
Define d¥ (6, , 62) and 6;(F, G) in terms of d; just like d* and 6* were defined 
in terms of d. We shall write p(6; , 62) for p(F; , F2) if F; e MN, 0(F;) = 4, . Thus 


dy (0; , 02) = 2"? (1 — p(0; , 6))'*. If S, and , are nonsingular, 


—1/2 


, S. 
(5.10) p(@,, 6) =|2,|'*| >|) ? 22 x 
( 


exp 4 =; (ur — pe)’ (2, + 22) ‘(uy — 66) fs 

where yw; and ye are regarded as column vectors and the prime denotes the trans- 
pose. (Compare Kraft [7], p. 129, where there are some misprints.) If 2, has rank 
r,1 Sr <k, then p(6; , 62) = 0 unless =, also has rank r and the normal dis- 
tributions with 6 = @,and 6 = 6 assign probability one to the same r-dimensional 
plane, H; in this case p(@, , 6) is equal to an expression like (5.10), with wu; and 
>, now denoting the means and covariances, in a common coordinate system, of 
the corresponding r-dimensional normal distributions on H. If the rank of 2, 
is 0, then p(@; , 62) = 0 or 1 according as 6; + 62 or 6; = 02. 

If T and A are subsets of 0, write p(6, A) for supe-.sp(6, 6’) and p(T, A) for 
supeerp(6, A). If G C M, define 6(G) = {O(F)| Fes}. 

Expressing the conditions (5.6) and (5.8) of Theorem 5.1 in terms of p, we can 
summarize the foregoing as follows. 
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THEOREM 5.2. Let N be the set of all k-dimensional normal distributions, k = 1. 


(a) If § C MN, then two subsets G and K of F are distinguishable (5) if and only if 
(5.11) min [p(4(F), 0()), e(@(F), A(35¢))] <1 


for all F ¢ §. Moreover, if condition (5.11) ts satisfied, G and K are distinguishable 
in the class of tests (N, @) such that Eye’ < ~ for somet = UF) > Oif Fe 5. 


(b) Two subsets G and # of N are finitely distinguishable if and only if 
(5.12) p(A(G), 0(H)) <1. 

We observe that condition (5.11) can be expressed in an alternative form. Note 
that p(@, , 62) = 1if and only if 6; = 6..1f @ = (u, =) ¢ 8, where Y is nonsingular, 
and A C 9, then p(@, 4) = 1 if and only if there is a sequence {4,;| in A such that 
each of the real components of @; converges to the corresponding component of 
6(in the ordinary sense). If = is singular of rank r, the same is true, but with the 
additional condition that the normal distributions with parameters 6; and 6 
assign probability one to the same r-dimensional plane. Thus, for instance, if 5 
is a set of non-singular distributions, condition (5.11) is equivalent to the state- 
ment that, for every / ¢ 5, the Euclidean distance of @(/) from @(G) or from 
6(3C) is positive. 

Condition (5.12) does not seem to have an equally simple interpretation. 

By way of illustration, let G and 3% denote two sets of univariate normal dis- 
tributions with positive variances such that uw < 0 if (u, o) © 6(G) and 6(5) 
{(u, 0) | (—p, o ) ¢ 0(G)}. Then G and & are finitely distinguishable if and only 
if «/o is bounded away from 0 in 6(3¢). They are always distinguishable (G u X). 
If 5 denotes a set of normal distributions with positive variances which contains 
Gu X, then G and & are distinguishable (+) if and only if the distance of every 


point (0, 0°) ¢ 6() from @(3) is positive. 
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THE STRUCTURE OF BIVARIATE DISTRIBUTIONS 


By. H. O. LANCASTER 


School of Public Health and Tropical Medicine, Sydney, Australia 


1. Introduction. K. Pearson [18] in his study on the association between two 
chance variables defined «a measure, the mean square contingency, @ = x /N, 
where x’ is that, usually calculated in a contingency table with fixed marginal 
totals, and N is the size of the sample. In a bivariate joint normal distribution 
with coefficient of correlation, p, Pearson showed that @ would have a limiting 
value if the sample size became indefinitely large, while the subdivisions of the 
marginal distributions were made increasingly fine. In effect, he was considering 
2 property of the parent joint normal distribution, rather than of a sample 
drawn from it. He noted that this limiting @ was independent of the scale of the 
marginal variables and was invariant under any bi-unique transformations of the 
marginal variables of the form, « — x’(r), y¥ — y'(y). If the distribution was the 
bivariate joint normal, he showed that p = ¢/(1 + ¢°). In some distributions, 
jointly normal with appropriate choice of the marginal variable, but not so with 
the variables actually chosen, he took the value of p° still to have the meaning 
that an appropriate transformation would vield the variables of the underlying 
joint normal distribution. 

Hirshfeld [8], considering contigency tables with a finite number of discrete 
values of the variables, sought for transformations of the marginal variables 
that would yield linear least squares regression lines. He found that these var- 
lables maximised the coefficients of correlation. 


Fisher [3] defined a set of variables on each of the marginal distributions of 


anim X n contigency table, such that 2x | for an observation falling into the 
Jth class and x; = 0 elsewhere for ) 1,2---m — 1, and similarly for ; with 
J 1,2 --- (nm — 1). His problem was to find a linear form in the x; , which 


would have maximum correlation with any linear form in the y;. For con- 
venience, these linear forms were considered without loss of generality as being 
normalised. Fisher referred to such a variable and the corresponding correlation 
as canonical and thus identified them with the canonical variables and correla- 
tion of Hotelling [10]. Fisher’s theory was amplified by Maung [13] and Wilhams 
25], who considered observational data in the form of a contingency table. We 
shall see later that in this case, the problem of finding the canonical correlations 
is equivalent to the determination of the canonical form of a rectangular matrix 
under pre- and post-multiplication by orthogonal matrices. 

It is of interest to extend this type of analysis to the theoretical parent popula- 
tion and to more general classes of bivariate distributions. Lancaster [12] applied 
the methods of the theory of integral equations to find the canonical correlations 
and variables in the joint normal distribution and this work leads to a generalisa- 
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tion of the canonical correlation theory. If the correlation is to have meaning, 
the canonical variables must have a finite variance, so that each canonical 
variable can be expressed as an orthonormal linear form in a complete set of 
orthogonal functions defined on the marginal distribution. The problem is now 
one in eigenvalue theory. Indeed, it is shown that the canonical correlations are 
the eigenvalues and the canonical variables on each marginal distribution form a 
subset, perhaps improper, of a complete set; the canonical variables are, more- 
over, the eigenfunctions except for a factor. This analysis holds provided the 
limiting value of Pearson’s ¢’ is finite. If ¢’ is finite, it is further shown that the 
bivariate distribution can be expanded in an eigenfunction expansion. @ is then 
the sum of the squares of the canonical correlations. The contingency table is then 
shown to be a special case of the general theory. 

Once the canonical form of a bivariate population, that is, the eigenfunction 
expansion, has been obtained, some further applications of the theory can be 
made. First, the regressions take a particularly simple form and are confirmed to 
be the solution of Hirschfeld’s problem. Second, given the marginal distributions 
it is possible to obtain bivariate distributions with prescribed correlations. Third, 
a goodness of fit test can be devised for the bivariate joint normal distribution, 
which displays as components of x’, the contributions of the regressions of the 
ith Hermite-Chebishev polynomial in x on the jth polynomial in y. The test is 
made of the total contributions from those pairs for which 7 ¥ 7. 


2. Pearson’s ¢? as the Sum of Squares of the Correlation Coefficients. Kk. 
Pearson [18] introduced ¢° as the “mean square contigency” for a bivariate dis- 
tribution in order to derive a measure of association independent of the sample 
size, N. He wrote @ = x’/N. Pearson saw that x’ (or rather ¢’) had a use as a 
descriptive measure, whereas it is usually thought of as a criterion of goodness 
of fit, e.g., as in the test due to Pearson [16]. It is convenient to modify Pearson’s 
definition by using the integral sign in the sense of Lebesgue-Stieltjes and adopt- 
ing the notation of Hellinger [7], which has been justified by Hobson [9]. 

DEFINITION. 


(LA) g = I (dF (x, »)|?/laG(2) dH(y)) — 1 


= I 2° (x, y) dG(x) dH(y) - 1 


(2) Q(x, y) = dF(x, y)/(dG(x) dH(y)}. 


\ here 


Q(x, y), and so the integrand of (1A), is to be taken as zero, if the point (x, 1 
does not correspond to points of increase of both G(x) and H(y). ¢@ can evidently 
be regarded as the limit of the sum Dia S85 (f:.f.;) — 1, where f,; is the weight 
of the bivariate distribution corresponding to marginal sets, A; and B,, and 
where f,, and f.; are the weights of the marginal distributions corresponding to 
the same sets. 
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Iixamples of bounded -¢° distributions are provided by the joint distribution 
of independent stochastic variables, in which case ¢° is zero, and by the bivariate 
normal distribution with the absolute value of the correlation less than unity. 
All diserete distributions with finitely many points of increase in both variables 
will also have a finite ¢. A case of special interest is provided by the bivariate 
joint normal distribution. In this distribution we may write g(x) dz and h/y) dy in 
place of dG(x) and dH(y) respectively and f(x, y) dx dy in place of dF (x, ). 
Pearson derived the relation, 


3) ¢ = [[ Pe y)/Ig(x)h(y)| dx dy — 1 = P/Q — p), 
where |p| < 1. This result has been discussed by Lancaster [12]. However, if 
p 1 and so the bivariate normal distribution is singular, ¢? is unbounded. 


Indeed, @ is unbounded for any bivariate distribution distributed along a 
straight line, with infinitely many points of increase. 

It follows from the definition by an analysis similar to that used to justify 
the Riemann integral that @ is uniquely determined by the passage to the limit 
if it is bounded. 

Derinirion. Let {x} and {y"} be complete sets of orthonormal functions 
defined on the marginal distributions, G(x2) and H(y), respectively by 


(4) [2 dG(x) = [y "y” dH(y) = 64. 


Let p,,; be the correlation coefficients, 


(5) pij = [| xy dF (zx, y). 


$y the Schwarz inequality p;; always exists and is not greater than unity in 
absolute value. Further, 


(6) Poo 1, Pork = pio 0 &. 3 ©. 


The following discussion gives a statistical content to some well known analy- 
sis. The steps taken can be justified by the theory of integral equations as set 
out in Courant and Hilbert [2] or Riesz and Szent-Nagy [22]. 

Tueorem 1. Jf F(x, y) is a ¢-bounded distribution and if 


(7) San = Sun (t,y) = D> 2, Ag ty, 


t=) j=( 


then 


(8) Qn = [| (Q — 8...) dG(x) dH(y) 


ts minimised by taking 


(9) Ais = pis, l 0, 1,2, +--+ m37 = 0, 1, 2, n. 
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Writing S for Sm, asm—> «x andn— «, 


(10) Q(x, y) = S(x,y), almost everywhere 


and 


(11) 


Proor. The set {x""'} X {y'"} is complete over the distribution G(x) & H(y), 
and Q(x, y), as defined in (2), is square summable by (1B) and the hypothesis of 
the theorem. The result (9) follows by differentiating (7) with regard to \,; for 
?=0,1,2,---m;j7 =0,1,2,---n. For any finite m and n, the sum 
a pi; = ¢, so that > i.3 G3 converges. The completeness assures the truth 
of (10) and of (11), which is the Parseval equality. 

It is our aim now to redefine the sets {2""} and {y‘?'! so that the correlation 
matrix, 


(12) R = (p:;), 


assumes as simple a form us possible. The theorems of the next section show that 
R is diagonal if we choose, for the sets {a°”} and {y‘"}, the canonical variables 
in the sense of Fisher. The chief difficulty lies in the need to prove that the 
canonical variables form subsets of complete sets of orthonormal functions. We 
have, therefore, to proceed indirectly. 


3. The Canonical Variables. The canonical variables have been defined on 
discrete distributions with finitely many points of increase. They are usually 
thought of as “scores to be assigned’”’ but may also be thought of as functions of 
the marginal variables. Often no marginal variable has been explicitly defined; 
then, we may take the row or column position as the variable. The following 
definition may be regarded as the appropriate extension of Fisher's definition. 

DEFINITION. The canonical variables (or functions) are two sets of orthonormal 
functions defined on the marginal distributions in a recursive manner such that 
the correlation between corresponding members of the two sets is maximal. 
Unity may be considered as a member of zero order of each set of variables. 
Symbolically, the orthogonal and normalising conditions are 


( (4) 


(g ss t (2), 0” = n(y), 


dG(x) = [x dH(y) = 0, 


* dG(xr) =| 9" dH (y) = |. 


"e” dG(x) = fan’ dH(y) = 0 
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and the maximisation conditions are that 
(14) pi = corr (E,9'”) = I] f'n” dF (a, y) 


should be maximal for each 7, given the preceding canonical variables. The p, 
are the canonical correlations and can by convention be taken always to be 
positive. 


THEOREM 2. The canonical variables obey a second set of orthogonal conditions, 
(15) B(g 9”) = I/ En” dF(a, y) = 0, ft = j. 


PROOF. For definiteness, let 7 > 7. By hypothesis £(£'’n'") is maximal in the 
sense of the definition above and is equal to p,;, say. Suppose that E(¢''7’’) is 
not zero but equal to p; tan 6. Now 7” has been defined according to (13) and 
so the function, Cos @7°" + Sin 677, obeys all the necessary orthogonal and 
normalising conditions, and its correlation with £" is easily found to be p; see 6 
and this is greater than p; , a contradiction results and so the theorem is proved. 

As has been already noted, the canonical functions are necessarily square 
summable and so can be written as linear forms in any complete set of ortho- 
normal functions, defined on the marginal distributions. Thus we can write 


« 


> au2”, 7. i = j. 


k=l 


= > bay”, > db, = 1. 
kel k 


. . »il 1 . . . 
Now let us determine £" and y in terms of the {2°"'} and {y'"} respectively. 


Corr (& af n ) = corr  ® ax, - by “7 
i * 
x x 
= 2. 2. Gibi pu. 


im] j=l 


Now >-,,, pi; is convergent and so the bilinear form on the right of (17) can be 
treated by the theory of quadratic forms in infinitely many variables. The 
normalising conditions (13) assure us that }>;ai = 1 and 5°; bj = 1 and that 
neither €"’ nor 7’ contains any constant term. The bilinear form will have an 
attained maximum value for variations in the a; and 6; . We take the coefficients 
of one such maximum to define a new set of variables 


(a = Diaz”, 


(18) 


eee eee 
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where the d2;, as3;, -*: are chosen to satisfy the orthogonal and normalising 
conditions. A similar transformation is applied to the 7°": 


(yt = 9” = Diidby”, 


| y* o = boy" at: boo ’ 


= bsy\” - boy” 


But now the correlation matrix, R (pi;), in the new variables is simpler in 
that, because of Theorem 2, 


(20) Pa = pi = O 1 | 


- . . . (2) (2 . . ty 
We can proceed similarly to find —~’ and »” in terms of the {x‘"} and {y"'} 
+ (2) 


respectively. Since &” is orthogonal to £" 


S 


ao 


(21) oo aya 


and similarly, 

oo 
3% (2) * *)) 
22) 7 = > bjy & 


J —, *2 *2 - > o(2 2) 1 i 
with S°, at’? = >>, bf = 1. Now to find &® and 7” we shall have to maximise 

« “00 *,* * rm: . ° ° ° 
dh. SOP aFbF pt; . This again has an attained maximum and we take again 
a new set of variables 


gt) — 7e p(1) 


F cz 
+-(2) 20 * (1) 
* (2) * 4(3) 
Q3ot* + a332°* 


+ 


5 . +(1) +(2) 3) ° . ) mn . 
and similarly define y 7 y tee terms of the y“’. The correlation 


matrix is simplified again for now 

= 0 for 2 
(24 

= 0 for? 


This process may be continued a denumerable infinity of times or until all p, 
are zero fori > r orj > r for some value of r. We may follow Williams [25] and 
refer to r as the rank of the departure from independence. r may be infinite. At 
each step, since the transformation is orthogonal, a complete set is transformed 
into 1 complete set. It is evident that we may pass from the sets fe) and fy'"} 
by a series of orthogonal transformations to complete sets of orthonormal func- 


) 


tions, of which the sets {£"'} and {7} are subsets and conversely. We can sum 
up these results in 
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lubonem 3. /f F(x, y) is a @-bounded bivariate distribution with marginal 
distribution, G(x) and H(y), then complete sets of orthonormal functions can be 
defined on the marginal distributions such that cach member of a set of canonical 
variables appears as a member of the complete set of orthonormal functions. The 


clement of frequency can be expressed in terms of the marginal distributions, 


, L ) 
(25) dF (x,y) =<1 + > pix y'”> dG(x) dH(y), ae., 
\ 1 J 
and 
(26) ¢ = Zh. 
i=l 


Poor. We have just proved the first statement. To prove the second we write, 
in the same way as in Theorem 1, 


(27) Q) [| {O(x, y) — Smnla, y) \” dG(x) dH(y) 


and take the partial differentials of Q with respect to \,; . Owing to the simplified 
form of the correlation matrix, p,; is now zero for i # j and p,; is p; . Since {x} X 
{y' | isa complete set on G(x) X H(y), it follows that the minimised Q tends to 
zero asm —> © and n — «, and (26) which is the Parseval equality follows. 

It may be proved that the choice of orthonormal functions is unique except 
for a convention as to sign if the p; form a pair-wise different set. It is assumed 
throughout that, once 2°" is chosen, "is defined so as to give the expectation 


of xy" a positive value. If, however, pj41, pj42, °° * Pj44 are of equal magnitude 
and «7, a7", +++ 2?" is one solution for the corresponding canonical vari- 
ables, then every other solution is given by an arbitrary orthogonal transforma- 
: p+ +k ° : +1 I+k 

tion on these 277" +--+ ax and the same transformation on the y ey ; 


A converse of Theorem 3 holds. 

‘THroreM 4. If a bivariate distribution can be written in the form (25) with {x°"| 
and \y' | forming complete sets on the marginal distribution and if Dd ipi is fi- 
nile, then the px are the canonical correlations, «' and y" are the canonical var- 
tables and > pi ?. 

Proor. The proof is by induction. We suppose first that the p; are pairwise 
different. Then if € and » are the first pair of canonical variables 


corr (£9) = corr (Qo ave, 2) by) 
I 


(28) 

> avhip;. 

' 
Now >, a; 6; = 1 and Cauchy’s inequality shows that the sum on the 

Lu 3 l * 
right of (28) is maximised by taking a, = b; = 1 and all other coefficients zero. 
Similarly, if p: = pe --- = px , Cauchy’s inequality shows that the correlation of 
- . aie k 2 — . . 
£ and 7 Is p; if > a; = | and a; = 6; and that this is the maximum. Clearly 
‘ . . (1) a 

however in this case too we can take a, = b; = 1, and once again x’ and y 


sare the pair of first canonical variables or functions. We can proceed by induction 


oe 
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to prove the main statement of the theorem. Defining Q(z, y) as in (2) and writing 
out its value by the use of (25), we derive 


» pi=¢. 


This is a generalisation of a result of Hirschfeld [8] and Maung [13] in the finite 
case. Further, we may note that Theorem 3 is a generalisation of the Mehler 
identity; for, using the notation of (3), we define complete sets of orthogonal 
functions fa’! = {y.(x)} and {yy} = {y,.(y)} on the marginal distributions 
where ¥,(x) is a polynomial of precise degree 7 standardised by the formula 


ll 


/ gdodedadele) dx = dy, 


(29) 


[ v.vonn dy = 5 


(xv) and A(y) have the same functional form in this case. By considering the 
expectation of expiix — 3 + uy — 30°}, namely exp put, we find that 


30) Bay” = bup 


and Mehler’s identity (Mehler [14]; Watson [24]) follows after Theorem 3 and 
continuity considerations. Conversely, given Mehler’s identity, Theorem 4 
shows that | p |‘ are the canonical correlations in this special case and the stand- 
ardised Hermite-Chebishev polynomials, the canonical variables. Pearson [17] 
showed the great value of the Mehler identity in discussing normal correlation, 
although he and his collaborator, Bramley-Moore, failed to note that the tetra- 
choric expansion is indeed the Mehler identity. The Mehler identity is the 
special case when f(x) and g(y) are standardised normal distributions and 
h(x, y) is the bivariate normal distribution with coefficient or correlation, p. 
This identity is given in Szeg6’s textbook [27] on page 371, where Szeg6 has 
wV/2 and yr/2 corresponding to our « and y and w for our p. Our 
¥i(x) is H,(2 ty) ,/t! in his notation. 

Dr. G. 8S. Watson (personal communication) has pointed out that the usual 
eigenfunction and kernel theory might be applied. The analogy is quite easy to 
establish in purely diserete or purely continuous distributions. In the continuous 
case we should define a kernel 


(31) K(x, y) = f(x, y) fg(x) h(y)}3 


where g(x) > 0, h(y) > 0, with the convention that A(z, y) = 0 if g(x) h(y) = 0. 
K(«<, y) would in general be unsymmetric. It would follow that 


fc 


pry” Vh(y) = | K (a, yx V/g(x) dz, 


pyr g(x) = | K(x, yy? Vh(y) dy, 
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in precisely the same way as in equation (26) and (27) of Schmidt [25], noting 
the different definitions for the eigenvalues. (32) is proved by the application 
of Theorem 3. In the finite discrete case, where the frequencies are f;; , the kernel 
K(x, y) is replaced by f,,f:’f>° = b;; and this is discussed in the next section. 
(32) is simplified if the marginal distributions are rectangular with 
g(x) = h(y) = 1. 


4. The Finite Case. The discussion above is a generalisation of a procedure, 
alternative to that of Fisher [3] and Maung [13], which may be used in the finite 
discrete case of an m by n contigency table with proportions f;; in the cell of the 
ith row and jth column, with f;, = >°; fi; > 0,f.; = Dos fi; > 0, and for definite- 
ness, m S n. It follows from Theorem 3 that if we construct matrices, XY and Y, 
with the (k + 1)th column consisting of the values of the kth canonical variable, 
then X’FY will have a canonical form with non-zero elements everywhere except 
along the leading diagonal. It is found simpler to deal with a matrix B derived 
from F and then the problem is reduced to determining a canonical form for a 
rectangular matrix under pre- and post-multiplication by orthogonal matrices, 
which we consider by an adaptation of the argument of Murnaghan [15] on his 
pages, 26 and 27. The defining conditions for the matrices X and Y may be 
written 


ti l ¢= 1,2. m 
ya = 1 += 1,2, n 
(33 
ian oo (3-1 oe ‘ 
t= Fy = & 7=2.3 m, 
(j-l (j-1 
Vij Na) = 1 7 = 2,3, n 


(13) now_heeomes 
|X’ diag f;,X = 1, 
Y’ diag f ;Y = 1,, 


and the elements of the leading diagonal of X’F Y are to be maximised. Theorem 4 
ensures that it is sufficient and Theorem 2 that it is necessary for X’F Y to be in 
canonical form. We therefore state without completing the proof 

THEOREM 5. Given an m X n contigency table with proportions fi; in the cell of 
the ith and jth column, let an m X n matrix, B, be defined by 


(35 65 = fif OTF. 


(34) 


Then orthogonal matrices M and N exist with elements of the first column +/f,. and 
vf; respectively such that M'BN is in canonical form, nameli 


(36 M'BN = C = {diag(1, a: -°- pms), On.n—m) 


It is evident further by a consideration of the forms of (M’'BN) (M'BN)’ and 
(M'BN)’ (M'BN) that M and N are the orthogonal matrices that reduce BB’ 
aud B’B respectively to canonical form. Conversely, it can be shown that if NV 
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transforms B’B to canonical form with unity in the leading position and / other 
non-zero diagonal elements, then an M, having for its first (fF + 1) columns the 
first (k + 1) columns of BN normalised, can be constructed so that M’BN is in 
the required form. In fact, the first (k + 1) columns of BN are mutually orthog 
onal because (NB)’ (BN) is diagonal. Maung [13], obtains the latent roots of 
BB’ or B'B by solving the determinantal equation, | BB’ — Al 0, in the 
usual manner. An alternative is to use the iteration method of Frazer, Duncan 
and Collar ({6], page 133). We note further that M and N must be of the form 


M = M,(1 + M.,), 
N = N,(1 + N3), 


(37 


where M, and N;, are of the Helmert type with first columns having elements 
, i : ‘ 
fi. and f’; respectively. Now the elements of 


(38) Mi BB‘'M, =14+W 


can be computed readily. Using the observed number, a,; in the contigency table, 


k k &’ a 
a (ss. 2 ty —~ Op41.; > a.) (ass >. = Gorss ya.) 
t=] 


(39) Pe as i=l io] i=l 


A k’ \4 


54 Og. Ape Aggy, Age ay > a; > a 
1 i 


The trace of W is x’. It does not take much more time to compute W than x 
if m is not too large. A computing routine is to form a matrix with elements in 
the first row, (a) ;@2, — a;,d2;), elements in the second row (a,; + d2;)a3, — (a 


dy_)a3; and so on. For each row, a standardising factor is computed, 


4 (Ay. Qg41. > a, 


\ 1 
The elements of W are then simply computed by formula (39). The Helmert 
matrix can be looked upon as generating sets of orthonormal functions, which 
take a simple form. The values for the canonical variables are then calculated 
by an orthogonal transformation 
(40) N = diag f7/M 
= M1 + M) 

where M, is the Helmert Matrix and M.WMz is diagonal, M2 being obtained by 
iteration and similarly Y can be written in terms of Vy; and Ne. 

A Numerical Exampie. Maung [13] has given the following example of a 
classification of Aberdeen schoolchildren by hair and eye colours (see Table 1). 

e ° , . k k 

\ matrix of elements, U, with uw; = (de4:.. > 145; — Aes; zo 1@;.) is 

given by 


1,487,190 -—273,082 —1,077,957 — 110,090 — 26,061 
16,182,645 773,584 —8,895,366 —7,831,720 — 229 , 143 
19,806,181 1,123,770 7,415,022 —26,653,016 —1,691,957 
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TABLE I 


Hair colour 
Eye colour 


Total 
Fair Red Medium Dark Black 
Blue 1368 170 1041 398 1 2978 
Light 2577 474 2703 932 11 6697 
Medium 1390 420 3826 1842 33 7511 
Dark 454 255 1848 2506 112 5175 
Total 5789 1319 9418 5678 157 22,361 


The elements of this matrix are now divided by the corresponding column totals 
of the contigency table to give a matrix (v,;). Divisors appropriate to each row 
of U are now computed, {ay.dz41, . > a;.\’ d,... Then w,; is > Usd jx/{dd;} 
or >a vu j/{d.d;}. We thus obtain the matrix, W, of (38). 


65.8744811 237 .1027158 73.4280109 
.1027158 = 1167.9147643  1252.2082711 
.4280109 = 1252.2082711 2450.0865906 


237 
173 
The trace of W is 3683.875836 agreeing with Maung’s value for x’. 


The orthogonal matrix, M, , of (37) is then derived from W by an iteration 
process and is 


0.085413 0.272546 0.958344 | 
0.522636 0.806650 —0.275985 
LO. 848266 —(Q. 524438 0.073545 


The values of the complete set of orthonormal variables associated with the 
Helmert matrix, VM), may be displayed as a matrix, 


1 2.279806 1.005036 0.548741 
1 —1.013777 1.005036 0.548741 
l 0 — 1.294598 0.548741 
l 0 0 — 1.822352 


In the jth column, all elements above the diagonal are = ag a fp; /(> i" 
Pe. >.) pe.) }*, the diagonal element is — { 5°27" py./(p;. doi pe.}* and element 


below the diagonal are zero. Post-multiplication of this cane by (1 + M).) 


vields the sets of canonical variables in the form of a 4 & 4 matrix, X, of Equa- 
tion (40) 


1 +1.1855 +1.1443 +1.9478 
1 +0.9042 +0. _— — 1.2086 
1 —0Q.2111 —1.3321 +0.3976 
1 —1.5458 40.9557 —0.1340 


The values of the elements agree with those given by Maung. 
The canonical variables in y can now be obtained by using Fisher’s algorithm 
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as in (45), below, and we may write the first four columns of the matrix, Y, as 


1 +1.3419 +0.9713 +0.3288 

1 +0.2933 —0.0236 —3.7389 

1 +0.0038 —1.1224 +0.1666 

1 —1.3645 +0.7922 +0.3625 

1 —2.8278 +3.0607 —3.8177 
Programs, similar to the computational process used above, are now available 
on electronic computers. 

Interpreting the findings, the first set of canonical variables arranges both hair 
colour and eye colour in the same order as was suggested by biological considera- 
tions. If there is an underlying bivariate distribution the first set of canonical 
variables gives the best values to be assigned to the marginal variables. 


5. Identifications of the Finite and the General Cases. We now state some 
corollaries deducible from the theorems above in such a way as to bring out the 
identity of the theory of canonical correlation as a special case of the more general 
theory; where appropriate, we have numbered these ‘‘a”’ for the finite case, ‘‘b”’ 
for the more general. 

COROLLARIES. 

(ia). p. are the non-zero latent roots of the matrices BB’ and BB’; p; are the 
“roots” of B under transformation by pre- and post-multiplication of B by 
orthogonal matrices. 


. 9 . . . . . 
(ib). pj are the eigenvalues of certain symmetric kernels and p; are the eigen- 


values of a certain, possibly asymmetric, kernel. 
(iia). The identity of Fisher [4] 


m—t1 


(41) fay = Fe Ss\l + De aay 


k=1 
is a special case of our Theorem 3. It is also proved by noting that 
(42) X'AY = M’BN =(C, 
and the inverse of X’ is diag f; X and the inverse of Y is Y’ diag f_; by (34). 
(iib). The generalisation of Fisher’s identity is given by Theorem 3. 
(ilia) and (iiib). If m, and nm, are the kth column vectors of M and N respec- 
tively 


{ p.m, = B’'m, ’ 
(43) 4 
| px My = Bn, ‘ 


or alternatively after (36) 
( BN = MC, 
= NC’, 


(44) 


diag f, XC, 
= diag f.;YC’, 
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(45) corresponds exactly with equations (26) and (27) of Schmidt [23] as modified 
in our (32). The equation (45) is the basis of Fisher’s [3] algorithm for the compu- 
tation of the canonical correlations, which we give as a corollary. 

(iv). The canonical variables can be obtained by iteration if pji: > pjso. 
From (45) it follows that 


(46) diag f;,'A diag fq'A’X = XCC’, 
and so 
(47) (diag f;'A diag fj'A’)’X mo = X(CC’)? xo 


Therefore if any vector Xo is taken orthogonal to the first 7 columns of X but not 
orthogonal to the (j + 1)th column, the iteration of the form (45) will yield a 
vector proportional to the (7 + 1)th column of X. This is a special case of iterat- 
ing using Schmidt’s (26) and (27), which we could rewrite as (ivb). 

(v). In Yates [26], arises the problem to find values for y such that y will have 
maximum correlation with an 2, which has prescribed values. 

We may write 


m—1 


( 2 
(48) z= Daa”, > @ = 1. 
i=l . 
Then from the canonical form of Theorem 3 and the use of the Cauchy in- 
equality, we find that 


m—1 
(49) y = Day”, 


i=] 


is such that the correlation of x and y is maximal and 


m—1 j 
(50) corr (x, y) = (= a; “:) ; 
t=1 

(vi). In either finite or infinite cases, it can be proved that the existence of k 
canonical correlations of unity means the distribution consists of (k+1) disjunct 
pieces. The case of one canonical correlation of unity has been treated by Richter 
(21). 

6. Regression in the Bivariate Distribution. If the bivariate surface can be 
described in the canonical form (25), then regression takes a particularly simple 
form. 

THEOREM 6. The regressions of the canonical variables are given by the lines, 


om pi, 
(51) 


(a) 

y = px 

7 ° . ° ) ( ( ) 

For i # j the regression of x‘ on y and y"” on x” are zero. 
Proor. This follows in the usual way by minimising 


I/ (2 — ry”)? dF(z, y). 


s) 
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Incidentally, we have proved that the regression of .x‘" on y'” is linear since any 
square summable function of y"" orthogonal to 7°" can be expanded in terms of 
the other orthonormal functions. 


7. Generalization of the Notion of Correlation. Many attempts have been 
made to find some way of obtaining bivariate distributions which would general 
ize the normal case. Pretorius [20] has given many references to such attempts 


Fisher’s theory of canonical correlation gives an alternative approach. Suppose 
we are given marginal variables with distribution functions G(r) and H(y), then 
a bivariate distribution can be formed using (25) provided that the series 
[1+ SP pia 


(2) 


y | is non-negative at points corresponding to increase in both 
G(x) and H(y). We may take one of the simplest possible pairs of distributions 
for the margins, namely the rectangular over the range —} to } and set up three 
different bivariate distributions. 
EXAMPLE 1. We take as our orthonormal sets of functions the normalised 
Legendre polynomials, in particular 
oY = 2 4/12, 


2 675 (° — rs). 


We can now assign correlations p; and p2 subject to the condition that the density 
becomes nowhere negative 


(53) dF (x, y) = {1 + 129, xy + 180p.(2° — ps)(y° — ps)} dx dy. 


. pA AEY oe oy - (2) 
But the maximum absolute value of #°’y°” is 3 and that of 2°"y"~’ is 5, so the 
expression in (53) will be positive if 


(54) 


EXAMPLE 2. We choose the cosine series as the orthonormal sets, 
ve” = 4/2 Cos(2rx), 
zr 4/2 Cos(4r2), 
and similarly define y“’ and 
(56) dF (x, y) = {1 + 2p; Cos (2xx) Cos (27) 


+ 2p. Cos (4rx) Cos (4ry)} dr dyY- 


This is non-negative if the absolute value of p; and pe are both less than 4. 

EXAMPLE 3. A further possibility results from forming arbitrary bivariate 
distributions, e.g., we might divide the square with corners at (+4, + 4) into 
four quarters and add +p; to the density in the first and third quadrants and 
subtract p; from the density in the second and fourth quadrants. We could also 
subdivide the original square into 16 parts and add p, to the four corner sub 


divisions and to the four central subdivisions and subtract ps from the remainder 
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The resulting distribution can be described with the aid of step-functions 


(2) 


(57) dF (x,y) = {1 + pix y" + por y" | dx dy, 


where 


+l forx s 0,fora < 0, 
(58) 


x —lifor} < x < } and +1 for z elsewhere. 


‘To obtain a complete set of orthogonal functions defined on [—3, 3] we divide 
this interval into four subintervals of equal length. On each complete sets of 
orthonormal functions may be defined. For example, we may choose the Legendre 
polynomi:ls as our set, standardized sO as to be orthonormal! on the uniform 
distribution |— 4, 4]. Corresponding to the first interval we define a set of orthog 
onal polynomials which have the values 1 P™(X), PX, i = 1,2 +++ where 
X + 3 is the fractional part of 4(2 + 1), on the first interval and zero elsewhere 
and similar sets on the other subintervals. The four sets of functions may be 
displayed as the elements of a four rowed matrix, /’, of infinitely many columns. 
The rows of this matrix are obviously mutually orthogonal since no two elements 
of the same column can be simultaneously non-zero. Let us now define Q = AP, 
where A is the matrix 


| | 
—] l 
—l —!1 
i —] 


‘The clements of Q are now an orthonormal set on the whole interval. gn r 


a term constant on [3, 3]. ge Cy ds. = ©. Ga IS necessary for completeness 


It is constant on any subinterval but changes sign being — 1 on the odd intervals 

very other function g;; of the form + P?°"(X). They” may be similarly defined. 
3 dij : : J 

It is clear from the examples that the same correlations can arise in a great 


many different ways. In the next section, we show how the methods can be used 
as a test of normality. 

These three examples show how bivariate distributions can be formed with 
arbitrarily prescribed correlation coefficients. Barrett and Lampard [1] give two 
other examples where such bivariate distributions arise naturally out of a physical 
problem 


8. A Canonical Partition of x. In testing whether a bivariate distribution 
is normal, the marginal distributions can be tested in the usual way by an overall 
x or the individual degrees of freedom can be displayed as previously sug- 
gested by Lancaster [11] by the aid of orthogonal polynomials. Moreover, ac- 
cording to the analysis of the present paper and that of Lancaster [11] the regres- 
sions of the orthogonal polynomials in z and y on one another should be zero 
except for polynomials of the same degree. We therefore may compute the regres- 
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sions and display them in the form of a matrix, which we explain with the aid of 
a well-known example, the correlation table of Pearson and Lee (Biometrika 
2,257), easily accessible in ((5], paragraph 30). After estimating the mean and 
variance of both variables, the regressions of the theoretical Hermite-Chebishev 
in one variable on those of the other may be computed and set out as suggested 
by Lancaster [11]. The mean and standard deviations have been computed using 
n as a divisor. The table of Pearson and Lee has been modified to 8 columns 
representing classifications of daughters’ heights. The ¥;(x)y;(y) sums of products 
of polynomials of the form, ¥i(x)y;(y)fi;, have been computed and divided by 
1376 the number of observations to give component x’s of a partition of x’. The 
leading 4 X 4 submatrix is as follows- 


. . ied 
19.238 —0.053 —1.834 
0.398 8.325 —0.460 
—0.328 —0.578 —0.350 2.390 


The term 19.238 corresponds to the regression of the first polynomial in the 
fathers’ heights on first polynomial in the daughters’ heights and to a correlation 
of 0.5186, which is slightly different from that given by Fisher [5] as the grouping 
is different. It may be noted also that the squares of the 3 X 3 submatrix exclud- 
ing the marginal terms accounts for over 446 of a x’ of 504.23 if the table is 
analysed by the usual x with fixed marginal totals, so that all the significant 
departure from independence is shown to be accounted for by the first three not 
identically zero diagonal terms, the sum of whose squares is 445. 

Pearson [19] gave a rule which substantially states that the number of degrees 
of freedom must be subtracted from the x’ of the test of homogeneity when 
computing @. We have 


¢ = (504.234 — 98)/1376 
= 0.295228, 

p = 0.295228/1.295228 
= 0.227935, 


p = 0.477, 


which gives a correlation approximately equal to that calculated here, 0.5186. 

An alternative canonical partition is given by estimating the means and var- 
lances and computing the marginal frequencies on the assumption of normality. 
A partition of x° is obtained as shown in Table IT. 

It is clear that the distribution of Pearson and Lee is fitted very well by the 
assumption that it is a sample of a bivariate normal distribution. The residual x° 
of 101.04 with 95 degrees of freedom represents the sums of squares due to all 
other regressions than the first three regressions of the form y;(2) on y;(y). The 
assumption of normality of the marginal distributions and a non-zero correlation 
are sufficient to account for the total x’, for the residual x’ is little greater than 
the corresponding degrees of freedom. 
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TABLE II 


Degrees of 


: . 
Source of x Freedom 


Difference of distribution of father’s heights from theoretical 
Difference of distribution of daughter’s heights from theoretical 
Regression of ¥:(y) on ¥1(x) 

tegression of Wo(y) on P2(z) 

Regression of ¥3(y) on ¥3(z) 

Residual 


Total 





9. Summary. The problems of Hirschfeld [8] and of the description of a 
contingency table by means of the canonical variables and correlations have been 
generalised to distributions limited only by the condition that the Pearson ¢° is 
finite. Any theoretical or observed distribution subject to this condition can be 
described by the canonical variables (that is, subsets of complete sets of orthog- 
onal functions in the variables of the two marginal distributions, which obey 
the second orthogonality condition that Ex“’y”’ is zero for i ¥ j, and the canoni- 
cal correlations. The theory of Fisher [3], Maung [13] and Williams [25] has been 
related to the eigenfunction theory. 

Mehler’s identity, or in statistical language, the expansion of the bivariate 
normal frequency in tetrachoric functions, has been generalised. The approach of 
Maung [13] has been modified to allow for an extension of the canonical theory to 
continuous marginal distributions. 

The methods used give a new test of goodness of fit for the bivariate normal 
distribution and enable populations to be constructed with arbitrary marginal 
distributions and correlations. 
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ON RENEWAL PROCESSES RELATED TO TYPE I AND TYPE II 
COUNTER MODELS' 


By Ronatp PYKE 
Stanford University 


Summary. Several renewal processes related to the Type I and Type II counter 
models are defined and studied. The distribution and characteristic functions 
for the secondary (or output) process of the Type I counter model are obtained 
explicitly. Both the non-stationary and stationary probabilities of the state of 
the counter, (locked or unlocked), are derived. Integral equations determining 
the distribution and characteristic functions for the secondary process of the 
Type II counter model are obtained. Also it is shown that a more general model 
proposed by Albert and Nelson [1] may be solved explicitly in terms of a cor- 
responding Type II counter model. An example of this general model is given. 
Related with each model is a discrete renewal process which is also studied. 


1. Introduction and Notation. Two important classes of counting devices are 
the Type I and Type II counters defined as follows. A counter for detecting 
radioactive impulses is placed within range of a radioactive material. By ‘an 
event has happened”’, we mean that an impulse has been emitted by the material 
and by ‘‘an event has been registered”’, we mean that an impulse emitted by the 
material has been detected and recorded by the counter. Due to the inertia of 
the counting device, all impulses will probably not be counted. The time during 
which the device is unable to record an impulse is referred to as deadtime. 

Derrinition. A Type I counter is one in which deadtime is produced only after 
an event has been registered. A Type II counter is one in which dead time is 
produced after each event has happened. Examples of Type I and Type II coun- 
ters are the Geiger-Miiller counters and electron multipliers respectively. 

In sections 4 to 7, attention will be given only to the Type I problem. It is 
stated theoretically as follows. Let XY, Y and Z be random variables (r.v.) with 
distribution functions (d.f.) F, G@ and H respectively. Let {X;}7a, {Y;}fu0 be 
independent X- and Y-renewal processes; that is {X,;, Y;:t 2 1,7 2 O} isa 
family of mutually independent r.v.’s and each X,; and Y; has d.f. F and G re- 
spectively. Set Xo = 0 (a.s.) and S; = 2 9 X; for k = 0, 1, 2, --- . Assume 
throughout this discussion that /(0) = G(O—) = 0, F isa non-lattice distribution 
and that all d-f.’s are right continuous. Define no = 0 and 

n; = minfkeI":S, > Yj41+ Sz 


j-1)5 
for) = 1, 2,3, --- , where J” is the set of positive integers. The above definitions 
are valid with probability one. 
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The secondary renewal process, {Z;}7_1 (to be referred to as the Z-process) 
is defined by 


Zi _ Sn, = Sa,_; (2 é rh. 


This is clearly a renewal process since the S;’s are sums of independent r.v.’s 


and since {n; — nj-1}7-1 , a sequence of identically and independently distributed 
r.v.’s, is itself a renewal process. {n; — nj-1}j-1 shall be referred to as the N- 
process, and H shall denote the common c.d.f. of the Z-process. It will be shown 
that E(n,) denotes the asymptotic bias of the counter. 

One may define a related stochastic process which is of interest in counting 
problems. Let | V.:/ 2 0} bea stochastic process, having a two point range space, 
with joint distribution functions derived from its definition which is: Vo = 0 
(a.s.) and 


1fZ+ Y¥. St < Zea, for somek cI" 


0 otherwise 
P\(t) = 1 — Pot) = Pr[{[Vi = 


P, = 1 — P, = lm P,@) 
t+2 

if the limit exists. 

A subscript, 7 say, affixed to any distribution function will denote its jth con- 
volution with itself. The zero subscript will denote the c.d.f. degenerate at zero. 

In sections 8 and 9 the Type II problem is studied. Its theoretical formulation 
differs from the Type I problem only in the definition of the N-process, which 
for the Type II problem is no = 0 and 


(1) nj=min{kelI™:k >nj1,8>S,+ Yrej;r=nji,-::,k— 1}. 


In all other instances, the definitions remain unchanged. For example, the second- 
ary renewal process is still given by 


aii Gel), 
although, it is clearly a different process. The same notation is used for both 
models in order to emphasize to the reader the common interpretation of the 
various symbols. 

In section 10 a more general model, suggested by Albert and Nelson [1], is 
studied. It is shown that the solution of this more general model is an immediate 
consequence of the solution of a corresponding Type II problem. 

We shall begin in section 3 by proving a theorem from which the quantities 
P,(t) are immediately deducible. 


To understand the connection between the above notation and the counter 
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problem itself, let Y; represent the deadtime caused respectively by the registra- 
tion of an event at time S,, in the Type I model and the happening of an event 
at time S; in the Type II model (time being measured from the registration of 
some event) and let X, be the time between the kth and (k + 1)-st impulses. 
The secondary renewal process is determined by the r.v. Z, which denotes the 
time between successive counts or registrations. The event [V; = 1] corresponds 
to the counter being unlocked at time ¢. For a more detailed description of the 
physical problem, the reader is referred to the references. (See e.g., Feller [2].) 

Throughout this paper, the integrals that appear are to be considered as 
Lebesque-Stieltjes integrals. This will avoid the special considerations that 
would otherwise be required in cases where the integrand has a set of discon- 
tinuities of positive measure with respect to the Stieltjes measure. Notice that 
the ordinary integration-by-parts formula holds for the Lesbesque-Stieltjes 
integrals that appear in this paper. A proof of this is possible by probabilistic 
methods. 


2. The literature and known results. The Type I and Type II counter prob- 
lems have been studied by several people. Most of these studies deal with the 
special case in which the input process is Poisson. Not only does the Poissonian 
input make the problems involved more tractable, but in this instance, it serves 
to make the statistical model very realistic, since the impulses from a radioactive 
material behave randomly over time, at least in time intervals which are short 
relative to the half-life of the material. For an extensive bibliography, the reader 
is referred to Takacs [3]. 

It is important, however, to study the more general non-poissonian models 
for several reasons. First of all, it is necessary at times to make successive counts 
and it is known that the secondary process of the first counter, which would 
serve as the input process for the second counter, is not a Poisson process even 
though the original process was. Secondly, these same theoretical models have 
arisen in other contexts in which the Poisson process is not so easily justified 
(e.g., in inventory theory, Arrow, Karlin and Scarf [4]). 

In his recent paper, [5], received by this author after completion of the first 
draft of this paper, Takacs also studies the general counter problem. Although 
there is some overlap, there are many differences in approach and coverage be- 
tween the two treatments of the problem. Theorem 2 is equivalent to results ob- 
tained by Tackaes in [3] and again in [5], for the case of continuous F and G. 
Even for this case, however, our result (4) is a simplification in that a double 
integral has been replaced by a single one. Attention should also be given to a 
recent paper of Smith [6], in which the Type II counter model with Poissonian 
input (and related quasi-Poissonian inputs) as well as the model with constant 
deadtime, is studied. 


3. A related renewal problem. In this section, we shall consider two alternating 
renewal processes, not necessarily independent, and obtain explicitly the prob- 
abilities, both finite and stationary, of one of the processes being in effect at any 
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given instant of time. To be more precise, let {U,}71, {Vi}, be two renewal 
processes with common c.d.f. K and R respectively. By definition U; and U ;(t ¥ 9 
are independent and similarly for V; and V;. Concerning the relationship be- 
tween the two processes assume only that {U,; + V;}7, forms a renewal process; 
that is, independence of U; and V; is not assumed. Let H denote the common 
e.d.f. of U; + V; for all 7. Define To = 0 and, for j = 1, set 


To; = Ur + Vit Us+ Vet --- + U;+V 
U,+Vit U2+Ve+---+U 
Define 


lif To. <¢ Ss T2; for somej > 1 
A(t) 
0 otherwise 


Po(t) = 1 — P(t) = Pr (A(t) = O}. 


THEOREM 1. For all t = 0 


at 


P(t) = | tt — Kit — 2)) dN(e) 


“0 


where N(x) = Zs o H;(x) and H; is the e.d.f. of T2; i.e., the jth convolution of H. 
Moreover, 


E(U) 
0 ~) = : 
Pon En PMY = eepy BV) 


whenever at least one term of the denominator is finite. Po is interpreted as being zero 
when E(V) = =~ and one when E(U) = &. 
Proor. By definition, 


Pot) Zz Pr (7' Sf <= 2 1] 


0 al 


DL | Prifs St < Ts + Us| Tx = 2] dH;(2) 


=() “0 


x“ et 


LX] li — Kt — &)) aH) 


0 - 


[ {11 — A(t — x) dN (x) 


as required. Since we are working with «an at most countable family of r.v.’s, 
the conditional probability argument used above and in proofs which follow is 
valid. The second statement of the theorem is an immediate application of a 
theorem of Smith ({7], Theorem 1) which we quote in a particular form for 
further reference. 
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TuHroreM 8: Jf k(x) is any bounded function, zero for negative argument, in- 
tegrable, non-increasing in (0, ©) for which k(x) ~ 0 as x — ~; if H is a non- 


negative non-lattice distribution function and 


N(x) = D> H(z) 
0 


then 


ow i 


lim | k(t — 2) dN(z) = | k(x) dey | y dH) 


reomye * Jo Jo 
The right hand side is to be taken as zero whenever the denominator is infinite. 

In connection with the last statement of Theorem 1, observe that P(¢) con- 
verges to the stated limit since the function k(x) = 1 — K(2) satisfies the condi- 
tions of Theorem S. We mention also that the last statement of Theorem 1 is 
actually a special case of a result concerning semi-Markov processes, given by 
Smith ({8], ef. Theorem 5). 


4. The N-Process of the Type I Model. Set po) = 0 = ro, 
pe = Pr[m = k) = Pr [nj — nyu = Fy] j,k eI) 
and 
r, = Pr [n; = k for some J] (k eI"). 
Moreover, define the corresponding generating functions, for | s | < 1, 
P(s) = > pi 8, R(s) = > r, 8. 


The N-Process may be considered as a sampling of the positive integers /~ 
that is, m < me < nj <--- and {n;,j 2 1} C J”. In this context, one may 
speak of the event EF, “an integer is sampled.’’ One may show that, in the termi- 
nology of Feller [9], this event is recurrent. Since, for all k ¢ I~ 


k—1 


re = P+ D> Pi Tei 


j=l 
one obtains directly the known relationships 


R(s) os 
1+ R(s) ’ 1 — P(s) 
Moreover, it is known that (cf. [9]) 

(1 —s”)P(s) 


(2) im tas = hs 
— 6 6=—l OL 


P(s) = 


m/E(n,) 


where m is the g.c.d. of those indices n for which p, > 0. The right hand side of 
(2) is to be interpreted as zero whenever F(n,), the ‘mean recurrence time,’ is 
infinite. 
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The probabilities p, are readily computed from the relation 


pe = Pr[Sia S Yo < Si] 


They are given in 
Lemma 1. For all k eI" 


De = [ [Fiily) — Fily)] dG(y). 
1~ 
Observe that the event E is a certain event. That is 


z=. p. = 1 — lim [ F,(y) dG(y) = 1 
JQ - 


k=l n--2 
since lim,..». F,(y) = 0 for all y 2 0 if and only if F(0) < 1, a condition which 
has been assumed. 
Define the r.v. N, for y > O as the smallest index k for which S, > y. Set 
Q,(s) as the generating function of the probabilities associated with N, . One 
may then easily show that for |s) < 1 


2 


P(s) [ Q),(s) dG(y) = (1 — s) ) s [ G(y—) dF,(y) 


“0— k ==() “0 


‘ . , rk . 
Consequently, setting M.(y) = E(N;,), one obtains 


E(nh) = [ M,(y) dG(y) 
/o— 


In particular 


“0 


(3) E(m) = / M,(y) dG(y) = | {1 — G(y—)] dM (jy). 
on Jo — 


It is well known, and easily proven that 


M,(y) = > F,(y) 
M,(y) will be used very frequently throughout this paper. We shall therefore drop 
the subscript and write M(y) = My,(y). 

Set uw = E(X) and v = E(Y). It is well known, (cf. Smith [7]) that if u < «, 
M(y) = y/u + ofy) asy — «. Thus if uw < &, by (3) E(m) < @ if and only 
ify < «.Similarly, ifu = ©,then M(y) = o(y) and, hence E(n,;) < © whenever 
vy < «©, The case of »h = © = vis special and will not be studied here. 


5. The Z-renewal process. In this section the c.d.f. of Z as well as its Laplace- 
Stieltjes transform will be obtained. Consider the notation 


¢(s) [ ee" dF(z). y(s) = / e ” dG(x) 
“0 


/o— 


&(s) = [ e * dH(x), y*(s) = [ e "G(x—) dM (x) 


“0 0 
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for all s = 0. One then obtains 
THEOREM 2. Forallz >O,seR 


(4) H(z) = [ G(u—)({1 — F(z — u)| dM(u) 
0 


(5) #(s) = [1 — ¢(s)|y*(s). 


Proor. Clearly 


H(z) = Pr(Z sa = DPr(Sin Ss ¥ < & S 2). 


k=l 


cY<S, <2]= [ I [F(z — u) — Fly — u)] dFy(u) dG(y) 
- 4 
[ [G(z) — G(u—)|F(z — u) dFy4(u) — | Fi (y) dG(y) 
Jo— 0— 
G(z) F(z) — [ G(u—)F(z — u) dFy-a(u) — [ F,(y) dG(u) 
0 = 


| G(u—) dF,(u) -— [ G(u—)F(z — u) dFy4(u). 
6 0 


Privy<S<Z)= G(u—) aFy(u) 
0 


and (4) follows by summation over k. To obtain (5) for s > 0, write 
2 


! 46) = [ ©" H(z) dz 


Ss 


Jo 
| e “G(u—) dz dM(u) — [ | e "F(z — u)G(u—) dz dM(u) 
“0 u— “0 u- 


l I 
~ ¥*(s) _ ~ o(8)¥"*(s) 


as required. At s = 0,4 may properly be defined by #(0) = 1. This follows by an 
application of an Abelian theorem to (5). That is, consider 


1 


lim [1 — ¢(s)]¥*(s) = lim | &"G(2—) dM(2)4[ &* aM(a)$ 
= \0 


s-+0+ s+0+ “0 
= lim G(z) = 1 
ren 
since M(x) > © asx— ~. 


Of particular importance to the counter problem is the expectation of the 
secondary renewal process. One obtains 
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THEOREM 3. E(Z) < ~, if and only if v < ~© andy < ~. Moreover 


(6) E(Z) = wE(n) = p [ M(y) adG(y). 
Jo— 


Proor. The first statement follows from the relationship 


max ( Yo, Xi) S$ Z1 S Yo+ Xn, (a.s.). 


The second statement is a consequence of a well known result in Sequential 
Analysis, for by it 


E(Z | Yo = y) = pE(N,) 


and (6) follows by integration with respect to dG(y). #(Z) is to be interpreted as 
infinity whenever v or E(n;) is infinite. Of course, (6) could also be proven di- 
rectly from Theorem 2 using (5). 

Let N(x) denote the expected number of partial sums of the Z-process less than 
or equal to 2; that is 


N(z) = > H ;(x). 
j=0 


Define the bias of the counter at time x by B(x) = M(xr)/N(x). Then as a con- 
sequence of Theorem 3 and a known asymptotic renewal theorem, one obtains 
Lemma 2. If un < ~, then 
lim B(x) = 1/E(n)) 
zx 
where the right hand side is to be interpreted as zero when v = +. 
It may be easily shown that this result is also valid for the Type I1 and Albert 
and Nelson models. 


6. The distribution of free-time. Let W = Z, — Yo represent the length of 
time the counter is free during successive registrations. Denote its e.d.f. and 
L-S transform by K and k respectively. Clearly E(W) = wE(n,) — v. Moreover, 

K(x) = Pr([Z, Ss «+ y\Y = y| dG(y). 
“0 


Under the condition [Y = y], Z; has a c.d.f. given by (4), but with G degenerate 
at y, i.e., G(u) = Lif uw = yand G(u) = 0 otherwise. Therefore, 


© rt+y 
K(x) = [ | {1 — F(x + y — u)| dM(u) dG(y) 
Jo— 


Vv 
se [ / li— F(x + y — u)| dM(u) dG(y). 
Jo— Jo— 


It follows similarly that /(s) is the expectation w.r.t. dG(y) of the L-S transform 
of Z; — y obtained under the condition [Y = y]. By (5) this is seen to be 


(7) k(s) = [1 — ¢(s)] [ / e**” dM(zx) dG(y). 


Jo— vy 
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According to its definition in section 1, P;(t) is the probability that the counter 
is free at time ¢. Setting U and V of section 3 equal to Y and W, we have as a 
consequence of Theorem 1, the following result: for all ¢ = 0 


(8) P,(t) = [ {1 -_ G(t -_ x)) dN (zx) 
™ 


where 


N(x) = > H(z). 


=) 
This formula differs from equation (26) of Takacs [5]. Moreover, in the limit 


Po = lim P(t) = v/pE(n). 


(+x 


Let the L-S transform of Po(t) be denoted by 


a® 


r(s) = Pia dPo(t). 
~“0— 
Then, by direct computation one obtains from (8) 


1— v(s) 
1 — &(s) 


gis) = 


7. Examples of the Type I counter problem. (a) F(x) = 1 — ¢€ “: This is the 
well known Poisson input counter problem which with various assumptions on G 
has been studied by several authors. For arbitrary G, the problem was treated 
by Takacs [3]. Because of special properties possessed by the exponential dis- 
tribution, this particular example may be (and indeed has been) solved in several 
different ways. In [6], Smith has shown that much of the essential simplicity of 
this case carries over in asymptotic considerations to a wider class of F which 
generate so-called quasi-Poisson processes. For the present example, u = 1/A 
and M(x) = A\x + 1 forx = Oand M(x) = O for zx < 0. The formulae of the 
previous sections become 


y(s) 
P(s) 
E(n) 


H(z) [ {1 — . *’| dG(y) 


J0— 
Av(s) 
&(s) = g(s)\¥(s) = — P 
¢(s)p ery 


These last two results may, of course, be obtained immediately from the known 
characterization of the exponential distribution that truncation on the left does 
not change the form of the distribution function. This implies that Z; = Yo + X 
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where X is exponentially distributed and is independent of Yo. Finally, for this 
example, we have 


, (A + s){1 — W(s)] 
x(s) = —————_ : 
A+ s — AV(s) 


(b) Y = d (a.s.): This important oft-studied case is applicable to counters 
for which the deadtime is independent of the intensity or amplitude of the in- 
coming radioactive pulses. For this case G(x) = 0 or 1 according as x < or 2 d, 
and the formulae of the previous sections become 

Pi = F,_1(d) a F(a) 
E(m) = M(d) 
0 H 2S@ 
H(z) =) f? _ 
i] {1 — F(z —u)| dM(u) zt @2«>e¢ 
d 


ad 


7 * u 7 (ol f e” dM (x) “i= {1 = ¢(s)] ee” dM(x) 
d ‘. 
and 
f a \-1 
ms) = (1—e™)s [1 — o(s)] [ e ** dM(z)) 
\ Jo- J 
(c) G(y) = 1 — e&”: The above two cases have been studied previously, 


whereas, the present case has not, to this author’s knowledge, as yet been con- 
sidered. In a different context, (7) has been employed by Scarf [4] for G ex- 
ponential. For this example, we have v = 1/8. 


px = {¢(8)" "fl — ¢(8)] 


P(s) = sll on ¢(8)] 
1 — s¢(8) 


E(m) = [1 — ¢(8))' 
a6) = FG) = [ el) — F(z — 2)] dM(2) 
a 


¢(s) — ¢(8 + 8) 


d ( 8) = SS 
1 — (8 + 8s) 
n(s) = — ae 
(8 + s){1 — ¢(s)] 
and 
Mis Ble(s) — ¢(B — s)] 


(8 — s)[1 — ¢(s)][1 — (8 — 8)]" 
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8. The general Type II counter. This problem is a very difficult one to solve 
in general. Discussions of the general problem have been given by Takacs [5] 
and Pollaezek [10]. Certain particular cases have been studied in the literature 
in greater detail. For example: Poisson input and constant deadtime by Feller 
[2], Poisson input and general deadtime by Takacs [3] and, with a different 
approach, by Chernoff and Daly [11], and exponentially distributed deadtime by 
Takaes [5). The same notation as that used for the Type I problem will be em- 
ployed in this section, but with the corresponding definition of n; , namely (1). 

In this model, it is simpler to evaluate r, than p, , contrary to what was ob- 
served in the Type I problem. Fork 2 1 


n=PriS,+¥,<Sir=01,--,k-N=/[ -- f 
“0 ~0 0 


a 


(9) c 


Gla, + te + e+) + ayi—) --+ Glai—) dF(x) --- dF (zx;) 


Therefore, by the same argument leading to (2), one obtains the relationship 


(10) E(m) = m/lim ray 
kw 
where m is the g.c.d. of those integers n for which p, > 0. If X < Y (a.s.) set 
m = Oand mn = & (a.s.). In all other cases Pr [X > Y] > 0. However, since 
pi = Pr[{X > Y}, one obtains m = 1. That is tosay, whenever Pr[X > Y] > 0 
E(m) = 1/lim r; 
k+o 

With a knowledge of r; , one is able to compute E(n;) and hence the expectation 
of the secondary renewal process. 

As before, set Zo = O (a.s.) and Z; = S,, — S,,_, . Clearly the Z,’s form a 
renewal process. The problem of deriving an explicit expression for H, the com- 
mon ¢.d.f. of the Z-renewal process, is extremely difficult. However, it is possible 
to display an integral equation which formally, but not always in practice, 
determines H. In section 10, an example will be given for which the solution is 
readily attained from this integral equation whereas it is not easily derived by 
other methods. Takacs [5] has, for the Type II problem, obtained an integral 
equation in N(t), the expected number of counts (partial sums of the Z-process) 
in {0, ¢] for all ¢ 2 0. These two representations are equivalent in the sense that 
H and N are uniquely determined one by the other. More precisely, for s = 0, 
the relationship between H and N is given by 


“ 2 po 1 
—st ( a st io 
(11) [ dN(t) = mi’ dH;(t) = i— &) 


THEOREM 4. For all z 0 


a2) H® =] f He — 2 -0)G(e + t-) dN dF) 


“0 
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and fors > 0 


(13) (s) = d(s)[1 + A(s)] 

where 

(14) As) = | o***9G(r+ t—) dF(2) dN(t). 
“o— 0 


(Notice that because of (11), A(s) + 1 is the L-S transform of N.) 
» union of two disjoint events, 


_ 


PROoF. (12) is obtained as follows. [Z; < 2] is the 


A and B say, where 


and 
< z — X, for some) 2 1). 


~ 
“~ 
oe 
IA 
| 
~ 
A 
N 
/ 


( learly 
(A) — . lI (; . 
Pr [ G(z—) ¢ r) 


Under the condition, [2 > Yo = y 2 x = X,] = C say, 


Pr (B\|C) =1-— >oPr(Z; sy —2,Zin >z—a) 


7=0 


j= 


=|— {1 — H(z — x — t)| dN(t) 
Jo— 


- | [1 — H(z — x — t)| dN(t). 


Therefore, 
H(z) = I G(x—) dF(x) + | [ | it — HG ~ 2 — 0) dN ay) aF (2) 
0— “0— z y-<z 


and an interchange of integration gives 


H(z) = N(O)(F(z) — F(z—)|G(z-) 
+ | [ {1 — H(z — x — d\G(e + t—) dN(t) dF (x 
0 


“0— “0— 


7 | | (1 — H(z — x — d\G(a + t—) dN(t) dF(z) 


“0 Jo- 


as required. For the proof of (13), consider changes of integration according to 


[ dz [ dx [ ‘ dt = [ dx [ dz [ ° dt = f dx [ dt : dz 


0 “0 “0 “0 


0 





RENEWAL PROCESSES 


It then follows that, for z > 0, 


: @(s) = | e H(z) dz = [ [ ” aaa 3 [1 — &(s)]G(z + t—) dN(t) dF(2). 
s 0 /0 ~ 8 


Solving for &(s) gives the desired result. As was stated earlier, Takacs [5] derived 
an integral equation in N(t), which may be shortened to read 


N(t) -1 = | G(x—) dW(2) 
Mass 


where 
W(x) = / F(x — y) dN(y). 
“0 — 


Upon taking Laplace transforms of both sides one may check that it satisfies the 
relationship (11). 

Of particular interest is the expectation of the secondary renewal process, 
namely E(Z). As in the Type I problem, it follows from known results 
of Sequential Analysis that 


nmi 


(15) E(Z) =E (= x,) = pE(n). 


j=l 


From the above theorem, one obtains 


E(Z) = lim ss = 1/lim sX(s). 


s+0 s s+0 
Thus, by (10), one obtains a double relationship 


1/E(m) = v lim sX(s) = lim x. 
sO ko 
Although it may well be that in a particular example one of the above limits 
will be computable, in most cases they will be unwieldy. For example, even 
in the case of Poisson input, the quantities r, are complicated expressions, al- 
though E(n;) is a simple expression best obtained in an entirely different way 
using the particular properties of the exponential distribution. The p,’s may be 
expressed in terms of the r;’s as follows; 


; n ‘ 1 c 
(16) Pr = — >* II ((-r)" ) k.! 

j=1 k; , 
where k. = }o%k; and >°* denotes summation over all vectors of integers 
(ky , ke, +--+ , kn) for which ee jk; = n. However, (16) will be, in most cases, 
very unwieldy, especially when one recalls the complicated structure of the r;’s. 


9. The case of constant deadtime. Partial results for this example have been 
given by Takacs [5] for the Albert and Nelson model to be studied in the next 
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section. Also, this case has been studied from a different viewpoint by Smith 
[6]. We shall study this special case in full. Set Y = d (a.s.). Then G(x) = 0 or 
1 according as x < or 2 d. From (9) we obtain for k = 1 


r, = 1 — F(d) = qsay. 
Consequently by (10) 
(17) E(m) = {1 — F@)|"' =q" 
which is interpreted as being equal to ~ if 1 = F(d). Using the notation intro- 


duced in section 4, one obtains 


R(s) = » rns = gs(1 — s) ’ 
and hence 
S¢ 
P(s) = ene 
sq+1-—s 
From this relationship, or by direct computation, one obtains 
Pp = q(l _— q) : 


Therefore, nm; has a Pascal (or geometric) distribution. The quickest way to 
obtain H and @ for this example is as follows. Clearly H(z) = 0 for z S d. For 
2 d 


H(z) = z Pr [S, < z|m = nl] py 
’ 1 


q>, Pr (S,S z)m = n\(l — q)’ 


Now 
Pr (S, S z|m = nj = Pr(S, 8 2z|X; <d,lisj<n-1,4,.> 4 
a Pr [U; -U,_t---+UnzAi+ V S32 
where the U,’s and V, are mutually independent with c.d.f.’s given by 


Pr [(U; Ss u) = K(u) = F(u)/F(d); (usd,l Sit <n) 


F(u) — F(d) (u = ad). 


4 a = ’ - : 
Prl{V s wu] L(u) [= Fa) > 


Therefore, for z = d 


(18) H(z) = a>: (1 — q)” t K,(z — u) dL(u) 
nal = 
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where K,, denotes the nth convolution of K with itself. It is then immediate that 
i e * dF(x) 
(19) (s) == « d on -. 
1- [ ee” dF(x) 


Jo 
One may check that expressions (18) and (19) satisfy the equations of Theorem 4. 
In [6], Smith has obtained for this case N(t) = 1 forO S ¢ S d and 


N(t)-—1 = | [F(t — x) — F(d)| dM(z) 
0- 


for t > d. By means of (11), one may show that this expression agrees with (19). 
(19) has also been obtained by Takacs [3]. From (15) and (17) it follows that 
E(Z) = yuq'. One may also compute 


i 
var (Z) = q ‘var (X) + 2uq . [ x dF(x) 
“0 
which disagrees with the expression given in Theorem 7 of [6]. 


For this example, not only is it possible to compute P(t), the probability that 
the counter is free, but one may also derive the quantities P(t) defined by 


P(t) = Pr[S; + Y; 2 ¢ for exactly k values of J] 


for k = 0,1, 2, --- . That is, ,(¢) denotes the probability that / impulses are 
in process at time ¢. Now then 


2 
P,(t) = > Pr[S; St —d < Sins S Sink < tS Sypeusl 
= 


t—d t—z 
= [ / [wilt — 2 — y) — F.(t — x — y)| dF(y) dM(z). 
“0 t—d—z 


In particular 


at 1 
(20) P(t) =| [1 — F(t— x) dM(a). 
Jo 
Define the real functions h,, (m = 0) as follows: for v S d set h,,(v) = 1 and 


for v = d set 


p—d 


h,(v) = 1 — [ F,,(v — y) dF(y). 
“0 


With these definitions we may write for k = 1 


i 
P(t) = [ [F.(t — x) — Fyalt — x) — Alt — vr) + hilt — x)| dM (2). 


“0 
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The functions h,, and 1 — F,, (m = 0) clearly satisfy the conditions of Theorem 
S, by which 


(21) P = lim P,(t) = uf [F'.(v) - Fy4i(v) = h,(v) 4 hy,-1(v)] dv. 
d 


t +0 


Moreover, by definition 


a@ 


2 v—d 
[he(v) — hyalv)] dv = / I [Fra(v — y) — Fi(v — y)| dF(y) dv 
/0 d 0 


[ / [Fxalv — y) — F,(v — y)| dv dF(y) 
“0 u+d 


d 
=u [ [F,.r(v) — F,(v)] dv. 
Jo 
Therefore, by (20) and (21) 


P. =p [ [Fia(v) — 2Fi(v) + Fass (v)] dv 
0 


d 
Po l1— wf [1 — F (v)] dv 


10. The Albert and Nelson generalization. Let p « [0, 1]. Define 
{Y with probability p 


y =; 
\0 with probability 1 — p = q 


which has e.d.f. G, where G,(0) = p, G,(x) = q + pG(x) for x > 0. Albert and 
Nelson [1] suggested as a generalization of the Types I and II counter models, 
the model in which the deadtime caused by an incoming pulse is Y or Y°”’ ac- 
cording as the pulse is registered or not. Formally, define no = 0 (a.s.) andj = 1 


: t ’ x(p) ' > 2 ’ , ' 
(22) nj = min {keI™:S8;+ Yi? S & (njua <i <k),S,,_,+ Y S S} 


where as usual the subscript on Y{”’ denotes identically and independent random 
variables with c.d.f. G, . The purpose of this section is to show that the e.d.f. H 
of the secondary renewal process, Z; = S,; — Sn;_, (J 2 0) obtained for this 
generalization is in fact completely solved once the general Type II problem 
is solved, and in this sense this generalization is a very slight one. 

Let Z‘”’ be the secondary renewal process of a Type II counter model in which 
the deadtime r.v. is Y. Let H denote its c.d.f., &” its characteristic function 
and N(x) = >°3 H}” (x). The distribution function of the Z-renewal process 
may then be given by 

TueoreM 5. For all z = 0 


H(z) = [ [ 1 — He — « — y)) Ge + y—) dN (y) aF(2) 
0 0-— 
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and for s > 0 
(23) #(s) = [1 — &?(s)] | | e° *™’G(a + y—) dF(x) dN’'(y). 

/o— Jo 


This theorem is proven in the same way as Theorem 4 upon noticing that in 
evaluating Pr (B | C), one need only consider a Type II model with r.v.’s XY and 
y’’. Thus, although at first glance one might suspect that this more general 
model would offer difficulties peculiar to itself, it is seen that a solution of the 
corresponding Type II problem automatically provides a solution of the general 
problem. For p = 1, this model reduces to the Type I] model and for p = 0, 
to the Type I model, as can be seen by comparing the definition of the N-process, 
(22), for these two values of p. 

EXAMPLE. In [5], Takacs works out the special case of the Albert and Nelson 
model in which Y is constant a.e. As a further example, we evaluate here the 


case in which G(r) = 1 — e “ (x = 0). As was pointed out above, it will be suffi- 
cient to solve the Type II problem in which the deadtime, Y”’, has 
c.d.f. G(x) 1 — pe” (x = O) and zero elsewhere. For this case we have 
by (14), 
dvs) = [ | “1 — pe **™] dF(z) dN (y) 
/o— /o 


= | els) — pe“ ™%o(s + d)] dN (y) 
9 — 


¢(s) ¢g(s + 2) 
~ 1—o(s) PT — eM Fd)’ 
Therefore 
x? (s) = ¢g(s) — gels + A) _ ac + ) x's + y) 
1 — ¢(s) “1 = ¢(s) 
since 1 — &”(s) = [1 + ‘”(s)]*. Since this relation holds for all s > 0, we 


obtain by recursion that for all n21 


n bul m 
= ye (—p)? 2 — II a 
¢j kno Ll — Ge 


ox\) 
n 
\ntl Pk+l (p)¢ 
(—p) | me: - (gs + A + nA) 
ken — Gk 
where for convenience we have set ¢; = ¢(s + jd). Since ¢(s) — 0 and X(s) — 0 
as s > © we finally obtain 


”’(s) = lim © (— yy’; 


Nw GS) ) = k=O - 


(24) ¥ (—p)* TI 


k=) i= 


$|" 


as 


ty (- eI 


7=0 k=O — - 





754 RONALD PYKE 


Thus &”(s) = \(s)[1 + A“ (s)]“, the solution to the Type II model in which 
the deadtime distribution is G°” (x) = 1 — pe (x = 0), is determined. From 
Theorem 5, in particular equation (23), one obtains 


1 


#(s) = o(s) — ols + A) - 


1+ r?(s + A) 

1 + d(s) 
Upon substitution of (24) into this expression, one obtains the solution to the 
Albert and Nelson model with exponential deadtime. When p = 1, (23) yields 
the solution to the corresponding Type II problem with exponential deadtime 
as given explicitly by Takacs [5] and implicitly by Pollaczek [10]. 


g(s) — ¢o(s + d) 
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MINIMUM VARIANCE UNBIASED ESTIMATION FOR THE 
TRUNCATED POISSON DISTRIBUTION 


By R. F. Tate! ano R. L. Gorn? 
University of Washington 


1. Summary. A minimum variance unbiased estimator is provided for the 
parameter of a truncated Poisson distribution in the case of truncation on the 
left. In this connection the distribution is obtained for the sum of n independent 
identically distributed truncated Poisson random variables, and then well-known 
properties of sufficient statistics are employed to obtain the estimator. For the 
case of truncation away from the zero value results are expressed in terms of 
Stirling numbers of the second kind. The estimator has a particularly simple 
form and tables are available for its computation. For the general case results 
are expressed in terms of what we call generalized Stirling numbers. As a by- 
product of the statistical considerations there arises an identity between general- 
ized Stirling numbers which may be useful in the study of Difference Equations. 


2. Introduction. Numerous articles have been written on the subject of the 
estimation of the parameter of a truncated or censored Poisson distribution. Our 
work concerns the former distribution. The two types of distributions can be 
distinguished as follows: Consider an ordinary Poisson random variable with 
range {0, 1, 2,--- }, and let A be a subset of this range. If values in the set 
A cannot be members of a sample, then a random observation of the restricted 
variable is said to have a truncated Poisson distribution or to be truncated 
away from A. On the other hand there is the possibility for values in the set A 
to be members of a sample, but for some reason not distinguishable from one 
another. In this case a random observation of the restricted variable is said to 
have a censored Poisson distribution. 

A situation calling for the truncated Poisson distribution would occur when one 
wishes to fit a distribution to Poisson-like data consisting of numbers of indi- 
viduals in certain groups which possess a given attribute, but in which a group 
cannot be sampled unless at least a specified number of its members have the 
attribute. For example, the group may be a household of people, and the at- 
tribute measles; the specified number would then be one. A censored Poisson 
distribution is used most often in connection with pooled data. 

The estimation problem for both the truncated and the censored cases has 
been discussed extensively from the point of view of maximum likelihood by 
Cohen [1]. Earlier results based on maximum likelihood were obtained by 

Received July 3, 1956. 
1 Research Sponsored by ONR, Navy Theoretical Statistics Project. 


? Submitted in partial satisfaction of the requirements for the degree of Master of 
Science in Mathematical Statistics. 





56 R. F. TATE AND R. L. GOEN 


Tippett [9], David and Johnson [2], and Rider [8]. Various other estimators were 
proposed by Moore [5], [6], Rider [8], and Plackett [7]. Plackett appears to be the 
only writer ever to propose an unbiased estimator for any of the cases of a 
truncated or censored Poisson distribution. His estimator for the parameter of a 
Poisson distribution truncated away from 0, which will arise several times during 
our discussion, is 


w= SK, 
n 


where the summation is taken over all X;, = 2. 

The present paper is concerned with unbiased estimators for the case of tail 
truncation. It can readily be shown that truncation on the right, that is away 
from A = {c,c + 1,--- }, precludes the existence of an unbiased estimator. 
The argument is based on the identity of two power series; details will be omitted. 

Assume that A = {0, 1, 2, --- ,c} for some c = 0. Let the Poisson density 
be denoted by 

f(x;r) = © x = 0,1,2,--- 


vs 


The density of the restricted random variable which is truncated away from 
A is then 
“hy, z 
ea 
g(x; A, c) = —>— : 
' —» A* 
zr! e - 
e+1 t: 
Consider a sample of n independent observations X;, , 
density g(x; A, c), and let 


eat x. 
1 


It is well known that > X; is a sufficient statistic for the family {f(x; \)}. A 
result of Tukey [10] states that sufficiency is preserved under truncation away 
from any Borel set in the range of X. Hence, in the case at hand 7’, is sufficient 
for {g(x; A, c)}. It can be verified that 7. is also complete. 

For the case c = 0 the distribution of 7) and the minimum variance unbiased 
estimator Xp are derived in Section 3. This is at the same time the most important 
case for applications and the easiest with which to deal. A recent extension of 
the table of Stirling numbers of the second kind makes \» easy to compute for 
many values of n and T). 

In order to express the results for the general case c = 1 in a simple form it is 
necessary to introduce the notion of a generalized Stirling number. This will be 
done in Definition 3 below. 

The following relations are quoted here for later reference. 
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Definition \ (Jordan [3], p. 169): Stirling number of the second kind. 


(-1)"< 
f= > (7) (—1)*(k)' t=nn+l,---, 


mn. 0 


~~ 
- 
—t 


Ofori < +. 
Definition 2 (Jordan, p. 185): 


re 2p—i , 2p a | = 
Cia= >, (—1)' ( Je P 
1 


j=p-+ 


Property \ (Jordan, p. 169): 


n 
t 


(i 


Property 2 (Jordan, p. 186): 


; . 
~ n+l > ti ae 
Si+1 = op Sy 

jan \J ‘ 


Property 3 (Jordan, p. 171): 
Craton = 1 Crn-t,e-se-a + (fF — 1)C rn, 4-20 « 
The generalized Stirling number will be introduced by 
Definition 3: 
@, = ( =r >> = = — i yn Do | 
’ 7 or - © sksss) 1 uy 


where k, 0,1, --: ,n3¢ = 1,2, --- ,e+2;¢ = nic + 1), n(c + 1) + 1,°-: ; 
andthe summation is taken overall (kh, , «++ ,ke42) such that ky4+ --+ + kege = nn. 
Properly }: 


0 ~ nh 
@,.: = S:. 


PP? operly 5: 
tl 5 
Gait _ C; n,t—2n + 


To verify Property 5 write @}, , as an iterated sum over k; and kz , and use Defini- 
tion 2. 

In Section 4 the distribution of 7. and the minimum variance unbiased esti- 
mator A. are derived for the general case. There, also, a simple unbiased estimator 
based on one observation is given for \, and is used, via the Lehmann-Scheffé- 
Blackwell method, to reproduce h.. When equated, the two expressions for d-(t) 
provide an identity for the numbers @‘,,,. The estimator used is related to 
Plackett’s estimator \*. 


3. The case c = 0. Let X,, X2, +--+, X, be independent random variables, 
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each with density g(x; A, 0) and characteristic function ¢o(a). Then 7’> has the 
characteristic function 


: 2 al n 
Yola) = [do(a)]" = (= aa =n) 
Using the fact that f(z; \) has characteristic function expd(e'“ — 1), and simp- 


lifying, we have 
deta n 
e — } 
Wola) a = ) . 


The inversion formula for characteristic functions shows that 7’) has the density 


1 +" 


pot) = = Wola)e *** da. 


- _— © 


A binomial expansion for the numerator of Yo(a) shows that po(t) is 
= 1)” n n : tt gener cee 
_——— (—1 -- da. 
(e* — 1)" a k ) Ls 2r 
Since inversion of expd(e'* — 1) results in eA‘/t!, the integral in po(t) is 


(kd)‘/t!, and from Definition 1 we finally arrive at 


pot) = __ dint S:, t=n,n+1,---. 
(e® — 1)"t! 

It was noted in the introduction that 7’) is a complete sufficient statistic for 
the family {g(x; A, 0)}. It then follows that if an unbiased estimator based on 
T) exists for \, it will be unique and have the property of minimum variance 
(See Lehmann [4]). The condition for unbiasedness of Xo is 

> i 


t=n 


In view of that fact that 


the condition becomes 


2 t 
> do(t) a Si = 


t=n 


Comparing coefficients of powers of \, we have the minimum variance unbiased 
estimator 


hot) = t 
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ee | 
or 
© 


Property 1 gives the alternative form” 


. t er 
ho(t) = - ¢ - = ;. 
n CH 


Mr. Francis L. Miksa has computed the most complete table to date of Stirling 
numbers of the second kind.’ Miksa’s table gives S? for n = 1(1)t, ¢ = 1(1)50. 
The quantity needed for the estimation of A, the parameter of a Poisson dis- 
tribution truncated away from zero, is 


= t =e! t 
Ao(t) = ‘(1 — Ga) = — C(n, t). 
i) St 


n 


A table of C(n, t) for n = 2(1)t — 1, ¢ = 3(1)50 appears at the end of this paper. 
Note that for certain values of (n, ¢), C(n, t) has not been tabulated, since 


C(n,t) = 0 when n=t2 1 

C(1,?) = 1 when ¢ 2 2. 
All other missing entries are | (correct to 5 decimals); for example, C(2, t) = 1 
for? = 19. 


For values of ¢ which are large compared to n, the asymptotic expression 
t ~ n‘/n! is available (Jordan [3], p. 173). Thus we have 


t—1 
holt) = ‘(1 “ (—+) ), 
n n 


The percentage error of approximation, E(n, 1), decreases with increasing ¢ 
when n is fixed. For fixed ¢/n the percentage error increases with increasing n; 
however, the rate of increase falls off rapidly, as can be seen from the following 
short table, computed for t/n = 4. 


(Ss? 


(n,t) (2,8) (4,16) (6,24) (8,32) (10,40) (12, 48) 
E(n,t) 006% 046% 073% 090% 101% 107% 


Since (15, 50) = 4%, we may consider the approximation quite satisfactory 
for 2 < n S 15,¢ = 51. For larger values of n we must have ¢/n larger than 
50/15, but not necessarily much larger, in view of the above table. For larger 
values of n one may also resort to the use of the unbiased estimator of Plackett, 
which was defined in the introduction and can also be written* 


* -£(1 -") 
n t/’ 


* This form may be thought of as a slight change of the usual estimator t/n due to the 
missing zero class. 

‘ This table is as yet unpublished. 

5 In this connection see also the definition of V. in Section 4. 
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where nm; is the number of observations in the sample which have the value 1; 
or one can use the maximum likelihood estimator A, which is the solution of the 
equation 


In summary, the following is an improved procedure for estimating A, which 
in many cases yields a minimum variance unbiased estimator. 
Estimate \ by 


— C(n, t), 
n 


x” , otherwise. 


An extended table of C(n, ) would be quite useful. However, in order to obtain 
such a table it is necessary to devise a method for computing C(n, t) which does 


‘ : a ; ; a . 
not depend on entries in the table of S; , since, for example, S50 in Miksa’s 


table is an integer of forty-seven digits. The authors have been unable to do this. 
The following facts should be observed in comparing our estimator with the 
estimator A* of Plackett [7] and the maximum likelihood estimator i. 
1. Plackett’s estimator A* and Xp» are different, and A* has exact variance 


so it is obviously a function of 7’). A simple numerical calculation shows that 
dh and Xo are different. Therefore, by uniqueness of unbiased estimators based 
on 7’), we see that \ is a biased estimator of 2. 

3. David and Johnson also showed that the asymptotic variance of \ is 


X 


(1 —s )? 


n(l1 — e- — em) © 


This is then also, for each fixed n, the Cramér-Rao lower bound for exact vari 
ances of unbiased estimators. The following calculations show that there is no 
unbiased estimator whose variance attains this lower bound: Let J,(x) denote 
the joint density of n independent truncated Poisson random variables, each 
with density g(x; 4, 0). Then, a necessary and sufficient condition for a Cramér- 
Rao estimator to exist is that there exist a function h(A) such that the expression 
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0 log J)(x) 


A+ AOA) 


c 


is independent of \ for all values of x. It can be verified that 


d log Jy(x) 7” (% Zi a 
nt 


Or l 

and that no such function h(A) exists. Moreover, since \» and \* are different 
functions of ¢, the variance of A* must exceed that of A>. Consequently, we may 
write (for each fixed n) 


1 — ey’ : ) 
‘ oe . . 
n(l — « ¥— ney S %e } ] 


4. The general case. The derivation of the distribution of 7. and the minimum 
variance unbiased estimator \. proceeds in a manner analogous to the case 
c = 0, except that here we use a multinomial expansion and generalized Stirling 
numbers. More precisely, let 


Then, 7. will have characteristic function 


20 os n e md ec a? n 
( ) = / : in . Leta a ae ; 
Vela (S- — _ {1 — F(ce)|" (. x x! ) 


After performing the multinomial expansion, employing the inversion formula, 
and evaluating the same types of integrals as before, we use Definition 3 and 
arrive at the following expression for the density of 7, : 


tric 
n!in@®@,.: 


eA"? t=n(e+1),nle+1)4+1,---. 
n(e@->%) 


0 J: 


p(t) = 


In the same way as before the condition of unbiasedness now yields 


Gat 
(SS 


nt 


r. (t) 


It is clear from Property 4 that for ¢ = 0, \.(¢) reduces to the expression of Sec- 
tion 3. Also, from Property 5 we see that 


. OC cnidbdie: 
u(t) =t SS 


Crn,t —2n 


At the present time there appears to be only one available table (Jordan, p. 172) 
for evaluating C,,; . This table handles the estimation problem forn = 1, --- , 5, 
2Mn+1istsnt 6. 
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One simple unbiased estimator for \ is 


O m=ct+1 
Ua) = 
ly vy = c + Z. 


Now we use the Lehmann-Scheffé-Blackwell Method (see Lehmann |[4]): 


lL) (e+1 


h.(t) = E(U. af, = {) = rP(X, ax a t). 


P(X, = «| T, = 0) canbe written as P(X, = x) PCS? 
and then simplified by the use of p.(f) to the form 


(SS, 1,¢ 
P(X, = 2\|T. =t) = 


n't 
Substituting this expression in the above, and equating the result with the 
earlier form of \.(t), we obtain the identity 
‘— (t-1 


GS, 1j= nO, 4 be 
(c+1)(n—1) J 


For c = 0 this reduces to a combination of Property 1 and Property 2. 
The natural unbiased estimator based on the whole sample, which may be 
generated from U,, is 


4 l< ; 
V lay, Bay °** 5 La) = - > U.(2;). 
nod 


Vo is precisely Plackett’s A*. 


~n—l 
TABLE OF 10° C(n, ) = 10°(1 — <2 


~t 


C(n, t)\(n, ) C(n, t)\(n, t) Cin, Oi, Oo Cin, t)\(n, t) C(n, t)\(n, t) C(n, t)\ (nm, O 
| | 


~ 


Cin, t) 


to to to & & tS tS 


66667) (3, 89701) (3, 
85714) (3, 93478) (3, 
93333] (3, § 95802 
96774) (3, 97267 
98413) (3, 98207 
99213) (3, 98818 
99608) (3, 1: 99218) (4, 5) 40000 
99804) (3, 99481 | (4, 6 61538 : 
99902) (3, ) 99655 (4, 7 74286 (4, 99943) (5, 66667\(5, 3 99756) (5, 99997 
99951|(3, 16) 99771) (4, 82305 99958) (5, 9 75529 
99976) (3, 99847) (4, § 87568 99968) (5, 10) 81728 
99988) (3, 18 99898 (4, 91130 30) 99976) (5, 11 86177 
99994) (3, 19) 99932 (4, 93586 (4, 99982) (5, 12 89434) | 
99997| (3, 20) 99955) (4, 95339 3! 99987) (5, 13 91851 
99998] (3, 21) 99970) (4, 96583 (4, 33) 99990| (5, 14 93686 
99999) (3, 22) 99980, (4, 14) 97482 (4, $ (5, 15 95070 
(3, 23 99987 | (4, 98137 (4, 35) 99904) (5, 16 96136) 
50000|(3, 24 99991 (4, 16) 98618 36 99996) (5, 17 96961 
72000|(3, 2! 99994 (4, 98971 (4, 37) 99997] (5, 
83333| (3, 2 


99997 (4, 
99998 (4, 
(4, 


99428) (4, 3 99998 (5, 2 98497] (5, gags 
99572| (4, 99998) (5, 98807) ( 1 Q9987 
99680 (4, ) 5, 99052) (5, QO989 
99761) (4, 99999! (5, 23 99245) (5, 43 99991 


yo- 
oso 


99999 


Sew nwn 
© @ 


99999 99821) (4, Soeee 5, ; 99399) (5, 9999" 
99866 5, 28 99521/(5, 4! 99995 
99898) (5, 3} (5, 26 99618 


99925) (5, 7 f ‘ 99695 


I 
Bo 
Ame ow 


QQug8 


be pew & & wv 


99997 


~] 
’ 


99805) , Af Qg09s 
99844) (5, £ Q99GS8 
99876 

99901) (6, 28571 
99921! (6, 47368 
99936) (6, § 60317 
99949) (6, §9549 
99959] (6, 11 76307 
99968} 12 81360 
99974) (6, 13 85202 


999079; (5, 14 R8164 


RBRBNVHWKNSW 


to 


97602 
99996 (4, 99233 (4, ¢ 99998! (5, 19 O8104 


ADnAAGDaanaawn 
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TABLE—Continued 
(n, ¢) Cin, t) (nm, 1) C(m, t) (n, 0) Cin, t)\ (mn, C(m, t)\(n, t) C(n, t) (n, Cin, t) (nm, t) C(n, t) 





6, 15 90474)(7, 34 99368) (9, 12) 46666 (10, 34 96576) (12, 18 59241 (13, 43 96100) (15, 33 84943 














6, 16 92294)(7, 35 99460) (9, 13 55765| (10, 35) 96946)(12, 19)  64244/(13, 44 96427/ (15, 34 86196 
6, 17 93738) (7, 36 99558! (9, 14 63008) (10, 36 97273] (12, 20 68514) (13, 45 96726) (15, 35 87330 
6, 18 94893) 37) 99605 (9, 15) 68847) (10, 37 97564) (12, ; 72182/(13, 46 96998' (15, 36 88360 
6, 19 95822|:7, 38 99662| (4, 16 73607| (10, 38 97822| (12, 2% 75348) (13, 47 97246) (15, 37 89297 
(6, 20 96573/(7, 39) 99711/(9, 17 77523| (10, 39 98052) (12, 2: 78095| (13, 48 97473| (15, 38 90149 
6, 21) 97182\(7, 40 99753\(9, 18 771: (10, 40 98257} (12, : 80489/ (13, 49 97680! (15, 3u 90926 
6, 22 97678) (7, 41 99788 (9, 19) 83485, (10, 41 98439) (12, 2! 82582) (13, 50 97869\ (15, 40 91636 
6, 23 98084|(7, 42 99820 (9, 20 85765 (10, 42 98602) (12, 26 84419 n=14 15, 41 92285 
6, 24 98417| (7, 43 99845|(9, 21 87693) (10, 43 98747|(12, 27 86036) (14, 15 13333) (15, 42 92878 
6, 25 QS690\(7, 44 99867' (9, 22 89331' (10, 44) 98877\ (12, 28 87465) (14, 16 24419) (15, 43 93422 
6, 26 98915)(7, 45) 99886'(9, 23 90727) (10, 45) 98993} (12, 29 88730) (14, 17 33725) (15, 44 93921 
6, 27 99101, (7, 46 99902'(9, 24 91924, (10, 46 990906'(12, 30 89853/(14, 18 41607) (15, 45 94379 
6, 28 99254/(7, 47 99916) (9, 25 92952)| (10, 47 99189/(12, 31 90852) (14, 19 48331) (15, 4 94799 
6, 29 99381|(7, 48 99928) (9, 26 93838) (10, 48 99272\ (12, 32 91743 (14, 20 54107/(15, 47 95186 
6, 30 99485) (7, 49 99939) (9, 27 94605 (10, 49 99347; (12, 33 92539 (14, 21 59098, (15, 48 95542 
6, 31 99572)(7, 50 99947/(9, 28 95269) (10, 50 99413 (12, 34 93251/ (14, 22 63434) (15, 49 95870 
6, 32 99644 n= § 9, 29 95846 n=l11 12, 35 93890) (14, 23 67220/ (15, 50 96172 
6, 33 99704 (8, 9 22222'(9, 30 96348 (11, 12 16667/(12, 36 94463) (14, 24 70539 n= 16 
6, 34 99754) (8, 10 38400' (9, 31 96787) (11, 13 20864) (12, 37 94978! (14, 25 73462 21769 
6, 35 99795|(8, 11 50505' (9, 32 97170) (11, 14 40476) (12, 38 95442) 14, 26 76044 0341 
6, 36 99830) (8, 12 9, 33 97505)| (11, 15 49120) (12, 39 95860) (14, 27 78333 37735 
6, 37 99R5R) (8, 13 9, 34 97799, (11, 16 56240/ (12, 40 96238) (14, 28 80369 44153 
6, 38 YORK2) (8, 14 9, 35 98057) (11, 17 62162) (12, 41 96579) (14, 29) 82185 49754 
6, 39 99902) (8, 15 77229) (9, 36 98283) (11, 18 67127) (12, 42 96887 (14, 30) 83809 ( 54665 
6, 40 O9G18) (8, 16 80916)(9, 37 98482)(11, 19 71323) (12, 43 97166) (14, 31 85264 . 58991 
6, 41 99932/ (8, 17 83925 (9, 38 98658) (11, 20 74890) (12, 44 97419) (14, 32 86571) (16, 25 62818 
fi, 42 99943)(8, 18 86400/ (9, 39 98812)(11, 21 77942) (12, 45 97648) (14, 33 87747) (16, 26 66215 
6, 43 99953/ (8, 19 88451/(9, 40 98949| (11, 22 80565) (12, 46 97856| (14, 34 SS808' (16, 27 69242 
6, 44 99961/(8, 20 90159 (9, 41 99069) (11, 23 82831) (12, 47 98045) (14, 35 $9767| (16, 28 71946 
6, 45 99967 2 91590 (9, 42 99175) (11, 24 84797|(12, 48 98217' (14, 36 90635| (16, 29 74370 
6, 46 99973 22 92795| (9, 43 99269) (11, 25 86508) (12, 49 98373) (14, 37 91421) (16, 30 b 

i7 99977|(8, 23 93813\(9, 44 99352) (11, 2% §8003/(12, 50 98514 (14, 38 92135 (16, 31 78510 
6, 48 99981) (8, 24 94676) (9, 45 99426, (11, 27 89514 n= 13 14, 39 92783) (16 2 80281 
6, 49 Q99R4) (8, 25 95411) (9, 46 99491; (11, 28 90466) (13, 14 14286) (14, 40 93374 (16, 33 51884 
6, 50 99987|(8, 26 96037) (9, 47 99548) (11, 29 91481)(13, 15 28000' (14, 41 93912) (16, 34 83337 

n=7 8, 27 (9, 48 99599, (11, 30 92377) (13, 16 35714) (14, 42 94403 (16, 35 84657 
7, 8 25000) (8, 28 9, 49 99644) (11, 31 93171) 13, 17 43849) (14, 43 95851) (16, 3¢ 85859 
7,9 42424/(8, 29 9, 50 99684) (11, 32 93875) (13, 18 50719) (14, 44 95260' (16, 37 86954 
7, 10 55 & 20 n= 10 11, 33 94501/ (13, 19 56565) (14, 45 95635/ (16, 38 87953 
7, a 64326 (8, 31 10, 11 18182)(11, 34 95058/(13, 20 61573) (14, 46 95978)| (16, 39 88867 
7, 12 71392) (8, 32 7|(10, 12 32258) (11, 35 95555) (13, 21 65888) (14, 47 96293/ (16, 40 89703 
7, 13 76841) (8, 33 10, 13 43357) (11, 36 95999) (13, 22 69627/(14, 48 96581) (16, 41 90469 
a. ae 81104] 8, 34 10, 14 52242) (11, 37 96395) (13, 23 14, 49 96846 (16, 42 91173 
7, 18 84480) (8, 35 98890) (10, 15 59447) (11, 38 96750| (13, 24 (14, 50 97089 (16, 43 91819 
7, 16 87181) (8, 36 99033) (10, 16 65354/ (11, 39 97068) (13, 25 78224 n= 15 16, 44 92413 
(7, 17 89362) (8, 37 99157)(10, 17 70243) (11, 40 97354) (13, 26 80424) (15, 16 12500) (16, 45 92960 
(7, 18 91135) (8, 38 99265) (10, 18 74324) (11, 41 97610) (13, 27 82369) (15, 17 23018) (16, 46 93464 
7, 19 92586) (8, 39) 99359; (10, 19 77755) (11, 42 97841/ (13, 28 84093) (15, 18 31944/ (16, 47 93929 
7, 20 93780)(8, 40) 99441) (10, 20 80657)\11, 43 98048) (13, 29 85625) (15, 19 39578) (16, 48 94358 
(7, 21 94769) (8, 41) 99512| (10, 21 83127/(11, 44) 98235)(13, 30) 86991)(15, 20) 46150/(16, 49) 94754 
7, 22 95589} (8, 42 99574/(10, 22 85239! (11, 45) 98403! (13, 31 88211/(15, 21 51843/ (16, 50 95130 
7, 23) 96274) (8, 43 99628| (10, 23 87054| (11, 46 98555) (13, 32 89304) (15, 22 56801 n=17 
(7, 24 96846 | (8, 44 99675, (10, 24 88619) (11, 47 98691 | (13, 33 90283) (15, 23 61139) (17, 18 11111 
7, 25) 97327/(8, 45 99716 (10, 25 89975/(11, 48) 98815)(13, 34) 91164/(15, 24 64953/ (17, 19 20648 
(7, 26) 97731|(8, 46) 99752) (10, 26 91152] (11, dy 98926) (13, 35) 91957| (15, 25 68319|(17, 20) 28888 
(7, 27) 98072\(8, 47 99783) (10, 27) 92178)(11, 50)  99027\(13, 36) 92672) (15, 26 71301|(17, 21 36054 
(7, 28) 98360/(8, 48 99810|(10, 28) 93074] n= 12  |(13, 37) 93318|(15, 27) 73951|(17, 22) 42317 
(7, 29 98603) (8, 49 99834) (10, 29 93859'(12, 13)  15385)(13, 38 93902|(15, 28)  76314)(17, 23) 47820 
(7, 30 98810|(8, 50 99855 (10, 30 94548 (12, 14) 27799) (13, 39 94430 (15, 29 78428) (17, 24 52676 
(7, 31) 98985 n=9 (10, 31) 95154)(12, 15 37949/(13, 40) 94910 (15, 30)  80322/(17, 25 56979 
(7, 32) 99134/(9, 10 20000 (10, 32 95688) (12, 16) 46340) (13, 41) 95345 (15, 31 82025| (17, 26 60807 
(7, 33) 99260) (9, 11 35065 (10, 33 
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38 7 
9 BS] 
40) 41662 
Ai 44782 
42 47693 
4 5OALS 
44 52058 
45 55341 
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47 OH 
48 6164 
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n = 32 

$3 06061 
4 1164 
5 16797 
6 21562 
37 25977 
8 30075 
39 SSRSS 


40 37429 





42 $3820 
43 46704 
it 40404 
15 519 
4 5430 
1 56554 
4% 5862 
49 Hono 
m0) 457 
rn = 33 
4 
6 
i] 23 
0 20344 
40) 33090 
41 36583 
42 SO846 
3 42897 


i4 45754 
45 48432 
46 40046 


7 53307 
18 55528 
49 57619 
50 59589 
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35 O5714 


6 11003 
7 15907 
8 20462 
39 24700 
40 28648 
41 32332 
42 35774 
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n, t) C(n, t)\(n, t C(m, t)\(n, #) C(n, t)\(n, b) Cin, t)\(n, t C(m, &)|(m, #) Cin, t)\(n, t Cin, t 
- — 
34, 43 38995) (45, 48 40075 (37, 40 147 s7| (38, 38 38800) (40, 46 25076) (42, 48 24074) (45, 48 12319 
44, 44 42011)(35, 49 51411)| (37, 41 19008! 38, 49 41517\ (40, 47 28422/(42, 49 27320) (45, 49 15977 
44, 45 44840) (35, 50 53613) (37, 42 23002) (38, 50 44981) (40, 48 1578/(42, 50 30387\(45, 50 19437 
4, 46 $7497 " 6 7, 43 26744 n = 39 10, 49 34557 n= 43 | , 4 
(34, 47 49993) (36, 37 05405) (37, 44 30253) (39, 40 05000) (40, 50 37369) (43, 44) 04545) (46, 47 04255 
4, 48 52343) (36, 38 10430) (37, 45 33547) (39, 41 09673 n 41 (43, 45 08821 (46, 48 0827 
34, 49 54555| (36, 39)  15107|)(37, 46 36643) (39, 42 14048) (41, 42 04762! (43, 46 12846) (46, 49 12071 
4, 50 56 10) 36, 40 19469 47 39557! (39, 43 18147/ (41, 43 (43, 47 16640) (46, 50 15664 
n= 35 36, Al 23542 is 42302) (39, 44 21994, (41, 44 43, 48 20221 n= 47 
5, 36 05556). 36, 42 27350) (37, 49 44884) (30, 45 25608' (41, 45 17361) (43, 49 23602' (47, 48 OAL 
35, 37 10709) (36, 43 30916) (37, 50 47332) (39, 46 20007) (41, 46 21070) (43, 50 26800) (4 49 O8 106 
5, 38 15497/ (56, 44 34258 n 3S 9, 47 32208) (41, 47 24565 n 44 17, 50 11833 
35, 39 190953) (36, 45 37395) (38, 39 05128) (39, 48 35225/(41, 48 27860 (44, 45 O4444 n 48 
35, 40 241071 (36, 46 40343) (38, 40) 09913) (39, 49 $8071) (41, 49 30971) (44, 46 08630) (48, 49 O4A082 
(35, 41 27984) 36, 47 43116) (38, 41 14384) (39, 50 40760 (41, 50 $3910, (44, 47 12577 (48, 50 O7945 
35, 42 $1608) (36, 48 45727| (38, 42 18567 n= 40 n 42 (44, 48 16302 n= 49 
(35, 43)  35000/(36, 49)  48188/(38, 43)  22487/(40, 41) O4878\(42, 43) 04651/(44, 49) 19823/(49, 50) 04000 
35, 44 $8179) (36, 5O 50510) (58, 44 26164) (40, 42 00445, (42, 44 09019) (44, 50 23149 
35, 45 41163 n 37 38, 45 29617) (40, 43 13727] 42, 45 13127 ’ 15 
35, 4f $5962) (37, 38 05263, (38, 46 32864/(40, 44 17745) (42, 46 16993, (45, 46 O4548 
35, 47 16596) (37, 39 10165 (38, 47 55920) (40, 45 21522) (42, 47 20637) (45, 47 OS 445 
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ON BALANCING IN FACTORIAL EXPERIMENTS! 
By B. V. Suan 
University of Bombay 


1. Introduction and Summary. R. C. Bose [1] has considered the problem of 
balancing in symmetrical factorial experiments. In all the designs considered in 
that paper, the block size is a power of S, the number of levels of a factor. The 
purpose of the present paper is to consider a general class of designs, where a 
‘complete balance’ is achieved over different effects and interactions. It is proved 
in this paper (Theorems 4.1 and 4.2) that if a ‘complete balance’ is achieved over 
each order of interaction, the design must be a partially balanced incomplete 
block design. Its parameters are found. The usual method of analysis (of a 
PBIB design [2]) which is not so simple, can be simplified a little for these designs 
(section 5), on account of the balancing of the interactions of various orders. The 
simplified method of analysis is illustrated by a worked out example 5.1. Finally, 
the problem of balancing is dealt with for asymmetrical factorial experiments 
also. Incidentally, it may be observed that the generalised quasifactorial designs 
discussed by C. R. Rao [4] are the same as found by the author, from con- 
siderations of balancing. 


2. Some lemmas regarding C-matrix and orthogonal contrasts. Let there be v 
treatments replicated r;, r2, «++ , 7 times respectively, in b blocks of k plots each. 
Let n;; be the number of times the ith treatment occurs in the jth block; (¢ = 
1,2,--- ,v;j7 = 1,2, --- , b). Then N = [n,,] is the incidence matrix of the 
design. It is assumed that every n;; is either zero or one. The set up assumed is 
that the yield of a plot in the jth block having the 7th treatment is u + a; + ¢; 4 
€;; Where p is the over-all effect, a; is the effect of the zth block, /; is the effect of 
the jth treatment and ¢;; is the experimental error. ¢€;;’s are assumed to be in- 
dependent normal variates with zero mean and variance o . Let Q; be the ad- 
justed treatment yield (adjusted for block effects) of the 7th treatment, and 
é, be a solution for ¢; of the least square equations. Let Q, t and t denote the 
column vectors (Q;, Q2, --- ,Q.),(4i,&, °-- ,t,),and (é,,&, «++ ,é,) respectively. 

It is well known that 


2.1) Q=Ct 
and the variance-covariance matrix of Q is 
(2.2) oC. 


where 


; ; | 
(2.3) C = diag (m,172,°++ 4%) — i NN’, 
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diag (11, re, ,7,) stands for a diagonal matrix, with diagonal elements 
ry . le ae oe , Fa « 

If Vl 1, the contrast I’t will be called a normalised contrast. 
LemMMA 2.1. Let Lit, Lit, vs © it be v — 1 estimable normalised orthogonal 
contrasts (1,’s are v-vectors), such that 


(2.4) V(lit) = 0° /6, 
(2.5) Cov (It, Ut) = 0 i xj 
then (i) the C-matrix defined in (2.3) is given by 
r—l1 
(2.6) C=) 4h. 
q=1 


(ii) Estimate of lit is given by 
(2.7) Lit = 1.0/4. 


Proor. Let E,,, denote anm X n matrix, all the elements of which are unity and 
1 1 
(2.8) 1 l. vari = J Eu = Li a Ey = L, 
|Vv iVv 


(2.9) LL’ = I, = LL, 
where 7, denotes av X v identity matrix. From (2.1) and (2.9) we have 


Q = CLL. 


(0 ) 

_— L'Q = L'CL(L’), 

but 

(2.11) E,0=0O and E,C =Q; 

hence (2.10) reduces to 

(2.12) LiQ = LiCL,(Lif). 

From (2.2) it follows that the variance-covariance matrix of L{Q is 
(2.13) LCL, o’. 

By hypothesis each of lit soe it is estimable, therefore (L:CL,) must have 
rank v — 1. Hence its inverse exists. 

(2.14) (Lit) = (LiCL,) 'L’Q 

ind 

(2.15) V(Lit) = (LiCL,)“'o’. 


Comparing with (2.4) we have 


a ' 5 l l 
(2.16) (L, CL, = ding (t pag ees ) 
0,” Bs bs 
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(2.17) i; CL, = diag (0, , 2, ome Posty 


\ my: , ‘ 
(2.11) and (2.17) imply that @,, 02, --- , 610 are canonical roots of C, and 


/ . . ° 
lh, ke, +: , ba, (1/Vv) Ex. are corresponding canonical vectors. Hence C is 
given by 


t 1 
(2.18) C = > 4,1, li. 
q=1 


Also from (2.14) and (2.16) it follows 


(2.19) Lit = diag (; . ae 
6,’ Oe 
This proves (2.7). 
LEMMA 2.2. In case some of the 6’s in Lemma 2.1 are equal say 6; = 6 = 6, 
6, then there will be infinitely many sets of normalised orthogonal vectors corre- 
sponding to the canonical root 6. The variance-covariance matrix of contrasts 


corresponding to any such set will be 


I, 
0 
and representation of C as given by Lemma (2.1) is unique; ie. if 1, 
and nm, --: ,n, are any lwo sets, then 


2 ut = > n; ni: 
t=1 


i=] 


The proof follows easily from observing that 
(2.20) [n, | me! --- | n,] 
where Ais an r X 7 orthogonal matrix. 


3. Definition of ‘complete balance’. In a factorial experiment with m factors 
F,, Fe, +--+ ,F,eachatS levels, if the treatments are denoted by (x, 22, -+- , tm) 
where x, is the level of 7th factor (x; = 0, 1, 2, --- , S — 1); then a contrast 
>. Cx, 295-0, tm (1X2, °** , Xm) (Summation is over all x2 2%, -+- ,2%m) be- 
longs to (q — 1)th order interaction between the factors [’;,,Fj,, °°: , Fi, , 
if Cz, , 23> --- »s_ depends only on 2;,,2;,, °--,2;, and >, C,,s, 
summed over the levels of any one of these q factors, is zero. 

Bose [1] has defined balance over a particular order of interaction in symmetric 
factorial experiments. In general, that definition is not interpretable, e.g. when 
a number of levels S is not a power of a prime, or the block size is not a power of 
S. Soa more general definition is necessary. 

DEFINITION 3.1. We shall define that a ‘complete balance’ is achieved over a 
set of n normalised orthogonal contrasts lt, --- , lit if and only if the variance- 
covariance matrix of their estimates is 
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DeFINITION 3.2. A more obvious definition of ‘complete balance’ over a set of 
vectors or contrasts represented by them is that every linear combination of 
these vectors giving a normalised contrast is estimated with the same variance 
say a’ /0. 

‘THEeorEM 3.1. Two Definitions 3.1 and 3.2 are equivalent. 

We will now say that complete balance is achieved over (gq — 1)th order of 


‘ 5 , m ; : 
interaction; if a complete set of ( 7 (S — 1)‘ normalised orthogonal contrasts has 


variance-covariance matrix (o°/@,) I, or if every normalised contrast belonging 
to the q factor interaction is estimated with the same variance o°/@, . 


4. Balanced factorial designs and PBIB. Let there be m factors each at S 


° e ° ° y m ’ ° 
levels ina symmetric factorial experiment. Let L, be S” X ( Jos — 1)* matrix 
q 


‘ .{m , , 
formed by a complete set of ( ) (S — 1)* normalised orthogonal vectors forming 
q r 


q factor interactions with the variance of the estimate of any normalised contrast 
belonging to a q factor interaction equal to 0/6, ;q = 1,2, --- , m. Further let us 
assume that the covariance between the estimates of any two contrasts belonging 
to the 7th and the jth (¢ # j) orders of interactions is zero. 

From Lemmas 1.1 and 1.2 C is uniquely represented and given by 


(4.1) C = > 0,L,L,, 


q=l 


which can also be written as 


C - [Lass], J = LZ are Ss, 
q= 
where f%, is the element of L,L; corresponding to ith row and jth column. 

Let the ith and jth treatments be (x,,22, --- , 2m) and (1,2, -** , Ym) respec- 
tively, and let 


(0,0, --- ,0) and (0,0, --- ,0,1,1,---,1) 


Pp times (m—p) times 
be the rth and sth treatments respectively. In the 7th and jth treatments suppose 
exactly p factors occur at the same level. Say 2), = Yi, Tig = Yioy 
ri, = yi,, and rest of the z;’s are not equal to the corresponding y;’s. Now in- 
terchange the levels x, , 22, +--+ , 2m With zeros, i.e., in any treatment if the 7th 
factor occurs at level x; replace it by zero and if it occurs at level zero replace 
it by «; . Perform this change for all the treatments. So naturally y;, , ys, , °°: 
yi, Will be changed to zeros. Now in the same manner as 2,’s, interchange the 
remaining levels y,’s with ones. After these interchanges call the 7;th factor as 
the first factor, ith factor as the second factor, --- , and lastly 7,th factor as 


, 


, 
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the pth factor and the other (m — p) factors as (p + 1)th to mth factors; and re- 
write all the treatments accordingly. Then it is obvious that the 7th treatment be- 
comes (0 0, --- , 0) and the jth treatment, 

OG, >-+ O11, +++, 2). 


Pp times (m—p) times 


It is obvious that interchanges of levels or renaming the levels of any factor 


does not alter the order of an interaction; so also the permutation or renaming of 
factors. Hence the above changes will not alter the order of any interaction. 
After renaming the treatments arrange them in the original order. This will 
mean permutation of rows of L,. Let the rearranged matrix be M, . Then the 
rth row of M, is the 7th row of L, and the sth row of M, is the jth row of L, . 
Let Ls. = [1;;] and M.M, = [m;,| 7,7 = 1, 2, --- , Sm. Then it is evident that 


(4.3) li; = Mee. 


It is easy to see that M, also gives a complete set of normalised orthogonal 
contrasts belonging to the (¢ — 1)th order or qg-factor interactions. Hence from 
Lemma 2.2 

/ / 
L,L, = M.M, 
(4.4) ; 
1.e. U,. = Mr. 


Hence 
(4.5) li =. 

This shows that f?; depends only on the exact number of factors say p, 
which occur at the same level in both 7th and jth treatments. Let us denote it 
by ft, ,p = 0,1,---,m;p = m denotes all levels equal (i = 7) and fx is a diag- 
onal element. 

Equating the two forms of C (2.3) and (4.2) with v = S”", we obtain 


(4.6) Diag (ry ,vay *** te) — ; NN’ = | 6, P| ’ 
v q=1 


Equating the elements we get 


(4.7) > eft 
q> 1 


and 


_A 


(4.8) i 


ij ts a 


where \,; equals number of times 7th and jth treatment occur together. 
Using (4.5), (4.7) and (4.8) we have 


(4.9) 
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and if 7th and jth treatments have p factors at the same level, 


(4.10) se cs Fig a ~ say. 
k q=l k 
Now (4.9) and (4.10) imply that the design must be a partially balanced in- 
complete block design. The definition of P.B.I.B. was first given by Bose and 
Nair [2] and later generalised by Nair and Rao [3]. 
Parameters 6, /:, r, being selected to satisfy combinatorial properties of the 


design and » = S”, pth associates of any treatment will be all the treatments 

which have exactly p factors at the same level as in the given treatment. Hence 
M\ po — 

(4.11) l,» = (S — 1) p=0,1,-°-,m-1 
p 

and 


(4.12) pr = etx = ‘) r ~ k —¢+ " (Ss —1)'*"(s — 9)(™ k—i—j+2u 
~\us\i-u j-u 


where summation extends over all the values of uw which are less than or equal 
to minimum of fk, 7, 7 and for which m+2u>k+i+). Parameters 
ho, Ar, °** , Am—1 are given by 





fo fo a fo Ao do 
RA - ile 1] 
(4.13) — 
-0 el cm | | 
LJ m J m yo. I m Am 1 Aw 
where A,, = —r(k — 1) 


0 l 
| <n for p=0,1,°--,m. 


and # is a dummy parameter always equal to zero, introduced to simplify the 


inverse relation. (4.13) can be shortly written as 


F(m)-@(m) = - a(m). 


v 


As it will be shown later in section 7 the inverse relation of (4.13) exists and 
can be written as 


(4.14) 6(m) = — 


- [F(m)] (mm). 


~~ 


Therefore it also follows that in every P.B.I.B. with parameters as given above 
‘complete balance’ over each order of interaction is achieved. 

Hence we have the following theorems. 

TuHeoreM 4.1. Every P.B.I.B. design with parameters as given in (4.11) and 
(4.12) achieves a ‘complete balance’ over each order of interaction. 
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THEOREM 4.2. If in a design 

(i) ‘complete balance’ is obtained over each order of interaction 

(ii) covariance between the estimates of any two contrasts belonging to different 
orders of interactions is zero; and 

(iii) the number of plots is the same in every block; then the design must be a 
P.B.A.B. with parameters given above. 


Coro.uuary 4.2.1. In any design with S treatments if complete balance is achieved 
over all contrasts then the C-matrix is of the form given by 


(4.15) Cc = a(1. — . Ess) 


and if the block size ts the same for all the blocks, then the design must be a balanced 
incomplete block design. 


From (4.15) it follows that if m = 1, 


(4.16) 


and hence 


(4.17) F(1) -313 os 


5. Analysis. Let us consider a symmetrical factorial design which is a P.B.I.B. 
of the type defined in section 4. Then as in (4.1) 


(5.1) C= 2, Oelig Leg 


q=l 


where 6’s are given by (4.14) as 
(5.2) 0(m) = —-— [F(m)] *.2(m) 


Hence if 1’t is any normalised contrast belonging to (¢ — 1)th order inter- 
action, applying Lemma 1.1 we have 


(5.3) 


V(1't) 


. (voO)" 
S.S. due to I’'t = _ 


Now if T; is the yield of the 7thtreatment, and t isacolumn vector (7;, Tz, +--+, 
T,) and we suppose that the experiment is a randomised block design with r 
replications, then 


(5.6) 
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(5.7) 
and 


as — ‘a (’T)’ 
(5.8) S.S. due to I’'t = — 


? 

Hence by comparing (5.3), (5.4) and (5.5) with (5.6), (5.7) and (5.8) respec- 
tively; we obtain the following procedure for analysis: 

(i) calculation of Q 

(ii) calculation of sums of squares for each order of interaction separately, 
as if it were a randomised block experiment but using Q in place of T 

(iii) caleulation of @,’s by using (5.2) 

(iv) correcting 8.8. obtained in (ii) by @,’s instead of by r. 
If we have a quasifactorial experiment or if it is necessary for some purpose, we 
will require estimates of individual treatment effects and variances of ele- 
mentary treatment comparisons. For that we know by (2.19), 


- 
(5.9) L,t = 5 L,Q. 
q 


Hence 


m 


(5.10) > L,L,t = br Lt; | Q. 
q=1 % 


q=l 


Since 


l 
(1 | Le . 2 * | | | / Sn Es") 


is an orthogonal matrix, (5.10) simplifies to 


(5.1 1) |r. _— E.. |é _ | : Lt, | Q. 


q=1 Ug 


where v = s”. Put E,,t = 0 and we obtain a solution given by 


| 4 Lt; | Q 
(5.12) q=1 94 
t = MQ say. 


Let U, be defined as follows 
0 
1/6; 
F(m) 1/0. 
1/64 


Then as in (4.5) Uo, U1, --- , Um are the elements of M. The element in the 
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ith row and jth column is U, if the 7th and jth treatments have exactly p fac- 
tors at the same level. Hence (5.12) simplifies to 


(5.14) i; = UnQ; + d Ui S(Q)) 

i=l 
where S,(Q;) is sum of Q,’s corresponding to the treatments which are 7th as- 
sociates of ¢; as defined in (4.11). From solutions (5.14) it is easy to see that, if 
t; and ¢; are pth associates 


(5.15) Vii; — t;) = 20°(U, — U,). 
Examp.e 5.1. Consider example with two factors A and B each at three 
levels 
b=6 K=6 


No = 


Block No Treatments 


(1 0 (2 0) (0 1) 
(0 0 (1 0) a 1 
(0 0 (2 0) 
(1 0) (2 0) 
(0 0) (2 0) 
(0 0 (0 1) 


> — 
ot to 


to ht WwW b&b be bo 
to et 


») 
=} 
» 


Using the formulas in section 7. 


F(2) —2 
- 4 
1 
i 
1 
[ oO 
=| 4 


[A é é iif : 
A, = = r ] ; 
05 —2 | —20 


Let Q,; denote adjusted treatment yield of (77) and 


Qj = > Qi; 


1=0 


Q. = + Qi. 


I=0 


Apply (5.2) 





BALANCING 


Main effect of A > Qi./4.3. 


t=() 


Main effect of B > Q*,/4.3. 


j=0 


2 2 
Interaction AB = 2(x Qi; ~ &% _ 2G.) 


—1/42 
F(2) /6 / —1/28 |. 


Hence using (5.14) 
i, = FQ) — PeSo(Q;) — a'pSi(Q,) 
and using (5.17) we get 
Vii; — &) = ye" if t; and t; are Oth associates; 
1 


= je otherwise. 


6. ST'S3?, --- , Sx” Factorial experiment. Some matrix operators are defined 
to derive certain furthe r results. 
Operator ‘X’ denotes the Kronecker product of matrices defined by 
~~ a2B, -+-, din B 
(lo B do B, ooo, & B 
AXB=l|a,]XB=j ” . — 
Gmi1B  Gn2B, +++, Amn B 


The operator ‘®’ denotes the symbolic kroneker product of suffixes defined by 


the following illustrations. 
| 
do do Nor 
LsJe [x] 


‘I 
| 


601 


Oo |. 
610 


“(eI 
” 
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THEOREM 6.1. Jf in a ST'S3?, --- , Sy” factorial experiment 

(i) any contrast belonging to the interaction involving q; factors at S, levels (i = 
1, 2, --- , h) is estimated with the same variance say a /Oo,03. ek 

(ii) the estimates of all effects and interaction are all uncorrelated and 

(iii) the block size is a constant equal to k say; then the design must be a PBIB 
with relevant parameters and conversely. 

If any two treatments have exactly p; factors (each at S; level) at the same 
level fori = 1, 2, --- , A; they will be called (pipe, --- , prx)th associates. Then 
we have ; : 

(6.4) Asses ym = JI (3!) (Se - 
a i 


and the relations between 6’s and X’s are 


F(m:) X F(me) X , «+: , & Flm,)-O(m,) @ O(m) @ --+ @ O(m,) 
(6.5) 


= —7 x(n) ® A(m2) @ --- ® A(m), 


6(m;) ® O(m2) ® --- @ O(m,) 
(6.6) = 7 [F(m,)]}* & |F(m2)}" & «++ & [F(m)]' 


-2(m1) ® A(ms) @ --+ @ A(m,) 


where 60, ... .o = O and Am m,. +m = —T(k — 1). 

Proor. The theorem can be proved for h = 2 exactly on the same lines as 
section 4 and relation (6.5) can be obtained by noting that the matrix represent- 
ing an interaction of (gq; + gz) factors out of m; + m, factors can be expressed as the 
Kronecker product of two matrices representing interactions of q: and q2 factors, 
out of m, and mz factors respectively ; and then using properties of the Kronecker 
product of matrices. And the result can be easily generalised for any value of h. 
(6.5) and (6.6) can be used to simplify the analysis of many asymmetrical factorial 
experiments. For example the design of plan 6.9 of Cochran and Cox [11] has 
parameters v = 3.2’, b=6, r=3, k = 6 and Aw=1, Aw =3, An = 2, An = O, 
Noe = 1, Ave = —15; hence 6’s can be calculated as 0, = 40 610 = 3 and Oe 
8/3, O2 = 5/3 and the analysis can be performed as in section 5. 


7. Evaluation of F(m) and [F(m)|"'. Put m,; = m, = --- = m, = 1 in (6.7) 
and write F(m;) as F;(1) to avoid ambiguity. Then (6.7) becomes 


Fi(1) * F.(1) X --- & Fi(1)-0(1) ® O11) ® --- ® O(1) 
= -7 Al) @ All) @ --- @ Al) 


From (4.17) we have 


(7.2) 
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Hence 


as «fm~—1t 1 


Hence (7.1) and its inverse relation can be written as 


ao —_ 
(7.4) Na, bi ink ee II’ S. 7. I] Gi(C.d:)Oc,09,---,€4 
tel OF — 


and 


h 
75) 64, a, ror, Gy = -- 7. Il H c4di)deye9,-+-,09 


tl 
where c; and d; take values 0 or 1; the summation is over all the values of 
(cyc2 , -** , Cn) and 
G,(11) = S; — 1 = H,(0,0) 
G10) = —1=H,(0,1) 


G00) = G,(01) = | H,10) = F,(11). 


Now put 8S; = S, = --- = S,; = Sin (7.4) and @,¢., ... . « = 9g where q = 
number of ones in (¢;¢2, °°: , ¢,); on simplifying the coefficient of 6, on the 


right side of (7.4) is given by 


(7.6) 2 II G,(e;d,) 


i=l] 


where _ is summation for those values of (e:c2, --- ,¢,) which have exactly 
q ones and h — q zeros. Now if the number of ones in (did: , --- ,d,) is p, then 
it is easy to prove that, 


ph 
(7.7) = II G;(c;d;) = pny (?) (’ a 7 (—1)? lig ma 1) 


i=l q = 1 


* ° es 
where >>? is summation over all the values of 7 such that 


max (0, p + q — h) Si S min (7p, Q). 
»&£ 1 P,4q 


Hence if there is balance over each order of interaction, Aa,a, , --- , a, depends 
only on the exact number of factors (say p) which occur at the same level. This 
must be so, as it was proved in section 4. Now writing Xa,a, , --. , a, &S Ap (7.4) 


becomes 


-t< “ : , 
8) = EEA) LP) ors - ve. 
A q=0 : 


Comparing (7.8) and (4.13) with m = h we obtain 


2 . | «({p m— 4 te ian 
(9) 5 = ° : = ms — Ay 
9 f= xe (?)("~?)-v ) 
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Working similiarly with (7.5) we obtain 
er le x {m—q q pela 
7.10) 6, Ky» j ine mj (—1) ( re 
where } ey is summation over all the values of j such that 
max (0, m — p — q) Sj S min (m — p,m — q). 


Hence the inverse relation of (4.13) exists and is given by (7.10). If g> is an 
element in the (p + 1)th row and (q + 1)th column of [F(m)]™ then on com- 
paring (7.10) and (4.14), we have 


“ . «({m-—p p awd j 
J = ; . = GS — 1)’. 
— m= d ( j Na, -i' " ¥ 


Equations (7.9) and (7.11) are not convenient for writing down the matrices 

l . ‘ ‘ ‘ . 
F(m) and [F(m)] . But the following relations, easily derivable from them will 
enable us to write out these matrices easily, along with a check. 


(7.12) gi = ("") (g — 1)"" 
q 


13) gp = (-1)” (S—1)”” 


9p = 1 


a= (™) (0 
q 


~ q-1 q-1 ’ a 
(7 97-1 =9> +9p214+ (S— lg, 


(7.17) g, = S”-fa—.- 

8. Remarks. It should be noted that a general class of quasifactorial designs 
as defined by C. R. Rao [4] has the same parameters as given in (7.4). Hence 
the variance of a treatment contrast for any design belonging to that class can 
beo btained from (7.5). 

Two factor designs in the above class form an important group. Their analysis 
can be done by using (7.4) and (7.5) with h = 2 and the method given in section 5. 
It will yield the same expressions as given by C. R. Rao and K. R. Nair in 
[10]. They are, therefore, not reproduced here. 

Secondly construction of PBIB designs with parameters as required in the 
above designs is considered by M. N. Vartak [5] D. A. Sprott [6] and C. R. Rao 
[4]. 

Furthermore in the above design if Xoo = Aor Or Ayo then 4); = 45; Or O19 and the 
design becomes a group divisible PBIB. 

All the designs mentioned in this paper can be successfully used by introducing 
Pseudo-factors. The method of introducing Pseudo-factors is discussed by Kramer 
and Bradley [12] for factorial experiments in group divisible PBIB. 
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A TABLE FOR COMPUTING TRIVARIATE NORMAL PROBABILITIES 
By GrorGe P. Steck 
Sandia Corporation 


1. Introduction. For convenience in the following discussion let XY, ¥, and Z 
be random variables with a trivariate normal distribution such that EX = EY 
EZ =0, EX’ = EY’ = EZ’ =1, EXY = pn, EXZ = ps, EYZ = on, let 
C(h, k, m; pi2 , pis 5 Pos) denote the probability that Y < h, Y S k, Z S m, and 
let D(h, k, m; pz , pis , p23) denote the probability that X = h, Y 2 k, Z = m. 
Several tables have been prepared from which certain particular values of the 
trivariate normal integral can be obtained. A tabulation of the area of hyper- 
spherical simplices is given by H. Ruben [1]. The function Ruben has tabulated 
as t, (x) is, for the case n = 3, equal to C(O, 0, 0; 1/7, 1/x, 1/x) and the tabula- 
tion is for x = 2(1)11. This probability can be computed directly, however, as 2 
special case of the well-known formula (for example, see [2]). 


C(O, 0, 0; pi2 , prs , p23) = D(O, 0, 0; pie , pis , pes) 
(1.1) ] 


= — (29 — arccos pz — arccos pi; — ATCCOS p23) 

dr 

Short tabulations of C(h, h, h; 1/2, 1/2, 1/2) have been published by D. Teich- 
roew [3] for hy/2 = 0(.01)6.09 and by P. N. Somerville [11] for h = 0(.1)2(.5)3. 
In addition to these published tubles, there are some unpublished tables [4] 
giving C(h,h,h;p, p,p) for p = 1/(1 + V3) and 1h = 0(.1)3(.5)8 and for 
p = 0(.1)0.9, h = 0(.2)1. 
Methods for computing D(h, k, m; pic, pis , p23) have been given by M. G. 
Kendall [5], R. L. Plackett [6], and 8. C. Das [7]. The method of Kendall is to 
express the trivariate normal density as the inverse of its characteristic function 
obtaining D(h, k, m; pi2 , pis , p23) aS a six-dimensional integral. The part of the 
integral involving the p;; is expanded in a power series and the result integrated 
term by term. The resulting series coverges slowly, however, when the p;; are 
large. Plackett’s method, on the other hand, is to consider D(h, k, m; pis , pis , pes) 
as a function of the p;; and write it as a line integral from (p12 , pis , p2s) to (pre , 
p13 , P23) Where po; is chosen to give a degenerate trivariate normal density so that 
D(h, k, m; pie, prs ; p23) becomes a bivariate normal integral. The result of this 
procedure is that D(h, k, m; pi2, pis , p23) can be expressed as a sum of lower 
dimensional normal integrals and an integral which must be evaluated by numeri- 
cal integration. 

The method of Das reduces the trivariate integral to a single integral which is 
then evaluated numerically provided the correlations are such that their product 
is positive and each is numerically greater than the product of the other two. 

In this paper C(h, k, m; pi2, pis , p23) is expressed in terms of the univariate 
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normal integral, the 7'-function, which is tabulated by D. B. Owen [8, 9], and the 
function S(h, a, b) which is tabulated here. Although the reduction of C(h, k, m; 
p12, pis , P23) iS given in terms of the 7’-function, it is also possible to give it in 
terms of the V-function tabulated by C. Nicholson [12] and by the National 
3ureau of Standards [13}, or the Z-function tabulated by Karl Pearson [14] 
and by the National Bureau of Standards [13]. The V and L-functions are related 
to the T-function by the expressions 


arctan a = 
——— — T'(h, a), 
sr 


| «© «o 
Lth, k; p) = - . | | 
t p) Da(1 ae p*)? h k 


exp | — 43 (2° + 7° — 2pry)/(1 — r) fae dy 


(1.2) V(h, ah) = 


1 Y Y \ o _ k = ph 
1 — $(G(h) + Gk) + ix) — 7 ii = 


where (this is the same 6 defined equivalently by (2.3)) 


ie f ifh <Oork < 0 but not both, 
_ \ otherwise. 


For h > 0,a > 0,b > 0, S(h, a, b) — (1/4r)arctan (b/(1 + a + a’b’)' = is 
the probability that three independent, standardized, normal variables will lie in 
the region between the planes x = 0, « — bz = 0, y = 0, and y = h and beyond 
(in the sense that z = ay) the plane z — ay = 0, i.e., will lie in the truncated in- 
finite wedge shown in Figure 1. 


2. Summary of formulas. The fundamental formulas for C(A, /:, m; pis, pis, pes) 
are: 


Case (i): h2= 0, k2=0, m=0 k S$ 0, csv, 
Cth, k, m; pre, pis, pes) = 3[(1 — Saye, )G(h) + (1 — Sage, )G@(k) 
+ (1 — ba,c,)G(m)] — ${T(h, a) + Th, x) + Tk, a) + TCh, 
+ T(m, a3) + T'(m, c3)| — [S(h, a, b:) + S(A, a, di) 
+ S(k, az, be) + S(k, c2, dz) + S(m, az, bs) + S(m, cs , ds)], 
Case (ii): h = 0, k>=0, m<0O or h=0, k<0, m > 0, 
Cth, k, m; pie, pis, p23) = 3[G(h) + G(k) — bul — Th, ar) — T(h, ee) 


— C(h,k, — m; pw, — pis, p23), 
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Fig. 1. Volume over which S(h, a, b) — (1/4r)arctan(b V1 + a? + ab?) gives the 
integral of the trivariate normal distribution. 


where 


z 


G(x) as oo" dx, T(h, a) 


k = h Pi2 


_~ k pox 
h(1 — p?.)'/?’ ” 


7 h — mpi3 
kl —_ p33) '/2’ ml — prs)!’ 
m — how om ise h — Kou _ ad c= mprs - 
h(1 — pis)!” k(1 — pi2)'? m(1 — ps3)" 
(1 — pi2)(m — hops) — (xs — Piz pis) (k — hprs) 
(k — hpy)A'? ; 
_ (1 = pis)(k — hp) — (62s — prz prs)(m — hprs) 
(m — hpy3)A'”? : 
= (1 = p23) (h = kpys) —< (p13 — pi pox) (m _ I:p93) 
(m — kpo3) A! ; ; 


_ (1 = pie)(m = kw) — (ors — piz p2a)(h — kr) 


(h — kp) A! , _ ; 
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—_ (1 — pis)(k — mpos) — (pr — pis poa)(h — mprs) 
(h — mpy3)A!? 


’ 


ee (1 — pos)(h — mpis) — (p12 — pis pos)(k — mpos) 
ts = a _ - en — 


(k — mpz)A! . ; 


A = 1 — pi2 — pis — p23 + 2pre pis prs , 


0 if (sgn x)(sgn y) = 1 


i 4 
|+1 otherwise 
and 
iz20 
sgn x = ; 
—-Iz<0 


The S-function is tabulated for 0 < b S 1, but it is possible to obtain values 


for 1 <b < « by use of one of the following formulas, a > 0, b > 0: 


S(h, a, b) [G(h) — 4] T(ah, b) — [G(hab) — 4] T(ah, 1/a) 


(2.4) 
+ S(hab, 1/b, 1/a), 
S(h, a, b) = (4)G(h) + [G(hab) — 3] T(h, a) — S(hab, 1/ab, a) 
(2.5) 


— S(h, ab, 1/b). 


ifa > 1,b > 1 then (2.4) should be used, and if 0 < a € 1, 6 > 1 then (2.5) 
should be used. Values for negative h, a, or b may be obtained by using 


(2.6) S(—h, a,b) = S(2,a,b) — S(h, a,b), 
(2.7) S(h, —a, b) S(h, a, b), 
(2.8) S(h, a, —b) — S(h, a, b). 


Note that (2.4) and (2.5) require both a and b to be positive and hence when a or 
b is negative (2.7) or (2.8) should be applied before (2.4) or (2.5). 
Other useful formulas are: 


S(0, a, b) = 4 S(x,a, 5), S(h, 0, 6b) = z G(h) aretan b, 


l 
S(h, a, 0) = 0, S(o,a,b) = ‘ arctan | . a —— a > 
(2.9) 2n (1 + a + a’*b*)!” 


S(h, «2, b) =.0, 


arctan |a 


'43RG(h) + Th, |a\)] — 


5 LA>e6 
S(h, a, 0) = 2n 
s13G(h) — Tlh, \a})), h <0. 


Equations (2.1) and (2.2) can be easily rewritten in terms of the V-function 
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by use of (1.2); however, in order to reduce the computation it should be noted 


that 
arctan a; + arctan cz = arctan (1/1 — p?./py). 


Similar expressions hold for the pairs (az , c3;) and (a3, ¢). 
Rewriting equations (2.1) and (2.2) in terms of the L-function gives 


Case (i): h2= 0, k= 0, m=0 or hs 0, tS G. ms 0 


Cth, k, m; pi2, pis, pos) = (1 — 46a,c,)G(h) + (1 — 4baye,)G(k) 
+ (1 = 36a5c3)G(m) + (Gre + Sam + bem) 
+ 3[L(h, k; pz) + Lh, m; pis) + Lk, m; p23) — 3) 
(2:4)’ — [S(h, a, , bi) + S(h, cy , di) + S(k, ae , be) + S(k, ce , de) 
+ S(m, az , bs) + S(m, ez , ds)}, 
Case (ii): h = O, k= 0, m<0O or As 0. ks 0, m > 0, 


Cth, k, m; pre, pis , p23) = Lh, k; pe) + G(h) + G(k) -— 1 


(2.2)’ 
oe Ch, k, = 1; 0a, ~ Bs ~ 23). 
3. Derivation of the relationship between the trivariate normal integral and 


the tabulated function. The density function for the standardized trivariate 
normal distribution is 


1 3/2 1 : , 
f(x, Y, 25 pizy pis, prs) = (+) aia &XP [—3(An 2 + Any’ 
(3.1) - 
+ Agg2z + 2Ay ry + 2Aj3 tz + 2A. yz)I, 
where 
1 — prs l — pis 1 — pw 
A 1 = : Ao =- 9 A; _ ’ 
. A A - A 
(3.2) 
_ P13 P23 — Piz Pi2 P23 — P13 Pi2 Piz — P23 
Ay = — , Ay = ) Ay = , 
A A A 
and 


A = 1 — pi2 — pis — p23 + 2pi2pispe « 


The definition of C(h, k, m; pi2, pis , p23) given earlier is equivalent to 


h k m 
(3.3) Ch, k, m; pis, pis, p23) = i | | F(X, Ys 25 pre prs, pos) dx dy dz. 
-% L— 30 0° 


Let G(x), T'(h, a) be as defined in (2.3). It will be convenient to have an alterna- 
tive form of T(h, a). This is given in [8] and is 


arctan a 


h 
+4G(h) -—} - [ G(ax)G' (x) dex. 


“0 


(3.4 T(h,a) = 





TRIVARIATE NORMAL PROBABILITIES 


Also, from [9], 


(3.5) T(h, a) = 3{G(h) + G(ah)] — G(h)G(ah) — T(ah, 1/a), a > 0 


linally, let 


A 
(3.6) S(h, a, b) = T(as, b)G’(s) ds. 


It will also be convenient to have an alternative form of (3.6). If the 7'-function 
is replaced by its integral representation as given by (2.3) and the order of 
integrations reversed, the result is 
} 1 G h( a a3 2\1/2 
(37) s(h, a,b) = 2 [ ih(1 +a +aby)") 
“0 


Jr 


A+ BY + a + abn 1: 
Integration of (3.6) by parts gives (2.4), and substituting (3.5) into (3.6) and 
integrating gives (2.5). 

The relation between C(h, k, m; piz , pis , p23) and the S-function can be shown 
as follows. If h, k, and m are all nonnegative (or nonpositive) and if 0/0 is taken 
as one, then it can be shown that 


P(X Sh YSk,Zsm)=P (x san,Ys-X,! - x) 
i 


+Pixs"yyskzs™ r)+ P(x hays*zzs m). 
k k m m 


Since these three probabilities are all similar in form, it is sufficient to consider 
only the last. Let the conditional probability, given Z = s, that XY < hs/m 
and Y < ks/m be denoted by A(s); then 


(3.9) A(s) =B ( - mors - k o mpra 8: ; hen P13 Pes _ ma) 
m(1 — pjs)! m(1 — p33)!” ((1 — pis)(1 — p33))?”? 


where 


1 h k 
(3.10) shia 2e(1 — ze [. 


“exp |—3(2° — 2pry + y) ‘1 — p)| dx dy. 
Therefore, 
a ee a : ; 
(3.11) - s-2Z,Zs8 A(s)G'(s) ds. 
m m 
However, it is shown in [9] that 
1i/y/(1. Y/ 2.) oa k om ph 

Boh, k; p) = 3G(h) + G(k)] — Tlh,—————— 

h(1 — p*)! 


ie h — pk 
- T(k, —— — 4ou, 
; ( k(l — a) ~ 


where 6,. has already been defined by (2.3); therefore, expressing (3.9) in the 


(3.12) 
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form of (3.12), substituting in (3.11), and noting (3.4) it follows that 


IA 


p(x Ssrslaee m) = }(1— bayc,)G(m) 
m m 


(3.13) 


117T'(m, as) +. T(m, C3)] 7 / G'(s) T (ays, bs) ds 


— G'(s) T' (es, ds) ds, 


where a; , b; , d3, and 6,,., have already been defined by (2.3). The integrals on 
the right side of (3.13) are, noting (3.6), S(m, a3, b3) and S(m, c3, d3). Thus 


, . Pp (x < Ss y <= k La Ss m) = 3(1 — bayc,)G(m) 
(3.14) m m 


— 4[T(m, as) + T(m, c3)| — [S(m, az, bs) + Sm, 3, ds)). 


The other two probabilities on the right side of (3.8) can be obtained from (3.14) 
by replacing m, az , b3 , c3 , d3 by h, a; ,b: 1 , di and k, az , be , co , de , respectively. 
Summing the expressions for these three probabilities gives (2.1). Equation (2.2) 
follows by noting that if h, /, and m are nonnegative or nonpositive, then 


Cth, k, Mm; Plz» Pi3 » p23) 


h kK m 
_ | | | f(x, Vy Zy Pir y P13 » p23) dx dy dz 
100 J—00 Joo 


h k “ 
= | f(x, Y, 23 pz »— prs »— p28) dx dy dz 
— 0 —O J 7m 


h k x m 
~ I I (/ — I ) ste Y, 2; Piz » P13, — po3) dx dy dz 


= Bh, k; pio) — Ch, .: —™M; Pi2 » — P13» — p23). 


The reader can verify that the familiar expression 
‘ I 
( (0, 0, 0; pie, pis, pes) = iz 27 — ALCCOS piz — ALCCOS pry — ALCCOS p23) 
47 


holds when h = k = m = O is substituted in (2.1). 

If the G-function in the integrand of (3.7) is expanded in a Taylor series to 
three terms with remainder about the point h(1 + a’ + a’(b/2)’)'*, then the 
following limited expansion can be shown to hold for S(h, a, b). 


S(h, a, b) = - GAC + a? + a’(b/2)*)'”) arctan (b/(1 + a? + a’b’)"*) 
<7 


+ £ G’(h( + a? + a’(b/2)*)"”) - Ay(a, b) 
a7 


(3.15) ° 
_h (aA(l + a® + a'(b/2)*)'*)-As(a, 6) 


yrs 7 
2:27 


+ 


+ Oh’ 


a] 
oO. 


=- sup G’’'(h(l + a? + a’é’)'”)-A;(a, b), 
@F O0<E<1 


TC 
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where | 6| S 1, and 


Ai(a, b) = arctanb — {1 + a? + a’(b/2)"}'” arctan(b/(1 + a° + a’b’)"”), 
Ao(a, b) = {2 + a? + a’(b/2)*}arctan(b/(1 + a? + a°b’)"”) 
— 21 + a’ + a°(b/2)"}"" arctan b 
+ a}log(ab + (1 + a® + a°b*)'*) — 3 log(1 + a°)}, 
A;(a, b) = a’b + {4 + 3a" + 3a°(b/2)’}aretan b 


—{l1t+a+a'lb 2)*}° *arctan(b/(1 + a? + a’b’)' *) 


—-3il+a@+a(b 2)" *tarctan(b/(1 + a® + a’b’)' *) 
+ allog(ab + (1 + a’ + a’b’)'*) — 4 log(1 + a@’)]}. 

If the first term of the series is used, the maximum error is one in the fourth 
decimal place for h S 2 and one in the fifth decimal place for h > 2, and if 
the first three terms of this series are used, the maximum error encountered will 
be less than six in the sixth decimal place (note from (2.9) that the arc-tangent 
terms in the series can be read from the h = ~ entries in the table). 


4. Description of the table. The values of S(h, a,b) given in the table were 
computed using a seven-point Gaussian quadrature formula on (3.7). The G- 
function in the integrand of (3.7) was approximated by a formula of C. Hastings 
(see [10], p. 187). A check of the computations was made for selected parameter 
values first by using an eight-point Gaussian quadrature formula with an im- 
proved method for evaluating G(x) and second by using a sixteen-point Gaussian 
quadrature formula with the same improved method for evaluating G(x). These 
two checks agreed with each other to nine decimal places and differed from the 
initially computed values by at most one in the eighth decimal place. These checks 
indicate that the tabulated values may occasionally be off by as much as 0.6 in 
the seventh decimal place because of rounding errors. Any number in the table 
whose last nonzero digit is a five is followed by a plus or minus sign to indicate 
that the number should be rounded up or down, respectively, when dropping 
the five. 

The range of parameter values for which the S-function is tabulated was 
chosen so that outside the table S(h, a, b) may be approximated by the first term 
of (3.15) with an error not exceeding five in the fifth decimal place. 

The accuracy of linear interpolation in the table was checked empirically in 
the following way. Let Ah, Aa, Ab denote the intervals of tabulation on h, a, b, 
respectively. The check was performed by computing S(h + 44h, a+ 4Aa, 
b + 4Ab) for a systematic selection of h, a, and b. Even though the errors found 
in this way are not necessarily the maximum errors in the various incremental 
cubes, it is felt that they are a reasonable approximation to these maximum 
errors. The errors found varied from about one to thirty in the fifth decimal place, 
which indicates that linear interpolation anywhere in the table should give an 
error of less than four or five in the fourth decimal place. 





788 GEORGE P. STECK 





5. A numerical example. In [6] Plackett applies his reduction method to the 
computation of 


D(—1.2, —1.0, 0.5; 0.7, 0.2, —0.4) = C(1.2, 1.0, —0.5; 0.7, 0.2, —0.4) 
= B(1.2, 1.0; 0.7) — C(1.2, 1.0, 0.5; 0.7, — 0.2, 0.4). 


The numerical values of the constants defined by (2.3) are: 


a, = 0.1867040 b = 4.0873367 h=1.2 
dy = 0.1091089 be = 10.5175180 k=1.0 
a; = 2.6536139 bs = —0.4252646 m = 0.5 
a = 0.6293828 c= 0.7001401 C3 = 1.7457432 


d, = — 0.7470863 dz 1.3079477 d; = 1.3146897, 
and, therefore, by (2.1) 
C(1.2, 1.0, 0.5; 0.7, —0.2, 0.4) = 3[G(1.2) + G1) + G(0.5)] 
— 3[7,(1.2, 0.1867040) + 72(1.2, 0.6293828) 
+ T;(1, 0.1091089) + 74(1, 0.7001401) 
+ 7;(0.5, 2.6536139) + 75(0.5, 1.7457432)] 
— [S,(1.2, 0.1867040, 4.0873367) + So(1.2, 0.6293828, —0.7470863) 
+ S;(1, 0.1091089, 10.5175180) + S,(1, 0.7001401, 1.3079477) 
+ §;(0.5, 2.6536139, —0.4252646) + S.(0.5, 1.7457432, 1.3146897)}. 


Tables of the G-function give $[G(1.2) + G(1) + G(0.5)] = 1.2088688, and the 
tables in [9] or [10] give 


—15° 7; = —0.2025741, B12, 1.0; 0.7) = 0.7940171. 

Applying (2.5) to compute S,, S:, and S; and (2.4) to compute S,, one finds 
S; = 0.1808805, S. = —0.0783075, S3 = 0.1927877, 
S, = 0.1016940, Ss = —0.0204185, Se = 0.0562510, 


and >> S; = —0.4328872, giving C(1.2, 1.0, 0.5; 0.7, —0.2, 0.4) = 0.5734075, 
and D(—1.2, —1.0, 0.5; 0.7, 0.2, —0.4) = 0.2206096. 

If the bivariate probability P(X > —1.0, Y > 0.5; p = —0.559714), incor- 
rectly computed by Plackett as 0.587191, is given its correct value of 0.204267, 
then Plackett’s answer is 

D(—1.2, —1.0, 0.5; 0.7, 0.2, —0.4) = 0.220610, 


and the answers agree to six decimal places. 


6. Extension of method to higher dimensions. Equation (3.8) can be gen- 
eralized to any number of dimensions giving 
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Un) 


provided all the u,’s are nonnegative (or nonpositive) and 0/0 is taken as one. 
Each term on the right side of (6.1) is expressible as an integral of a lower dimen- 
sional probability, for example, 


fe eas < Un—-1 8 , = s)ar) ds. 
Un | 
Since the three-dimensional normal distribution can be tabulated as a function 
of three variables, it follows by mathematical induction, using (6.1) and (6.2), 
that the n-dimensional normal distribution can be tabulated as a function of n 
variables. 
As an example, consider the casen = 4. If EX; = 0, EX’ = 1, and EX;X; = 
pi; then the probability in the integrand of (6.2) can be expressed as 


P (x: = = a, te 
(6.3) U4 


where 


“c= 44 Pisa *. 
- rr" Pij 
ull — pia)’ 


Therefore, (6.2) can be written as 


P (x, <a. 
Un 


= | C'(agr 8, O42 8, O43 8} pis ’ pi By pos) G' (s) ds. 
. 2 


If the integrand of (6.4) is expressed by (2.10) and the result integrated, it is 
apparent that the left side of (6.1) can be expressed in terms of the G-, T-, and 
S-functions and integrals of the form 


Rh, a,b,c) = | S(as, b, c)G’(s) ds. 
 ) 
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ADMISSIBLE AND MINIMAX INTEGER-VALUED ESTIMATORS OF AN 
INTEGER-VALUED PARAMETER' 


By D. 8. Rosson 
Cornell University 


1. Summary. The decision problem considered here is that of deciding which 
element of a finite parametric family of probability distributions p(x, uw) repre- 
sents the true distribution of the statistic X. It is assumed that p(z, y) satisfies 
certain regularity conditions which essentially require that the parameter yu be 
integer-valued with known bounds and that p(z, u;)/p(x, uo) be an increasing 
function of z whenever wo < u: . Complete classes are characterized for various 
loss functions W(u, a) which are convex functions of the decision a for each 
fixed value of u. Minimax proceduresare considered forthecase W(u, a) = | a — p |". 


2. Introduction. The problem of estimating an integer-valued parameter is 
viewed as a special case of Wald’s general statistical decision problem. The 
chance variable X is known to be distributed over the sample space M according 
to a probability distribution p(x, w~) depending upon a single unknown integer- 
valued parameter » with the known bounds 0 S u S N. The statistician is 
required to make one of N + 1 decisions, corresponding to the N + 1 different 
possible values of yw, on the basis of a single observed value of X. A decision 
function 6 therefore has the form 


(1) 5(x) = (do(x), 5:(2x), = * , 5xv(x)) 


where 6,(2) = 0 for a = 0,1, --- , N and Zz. v0 64(2) = 1 for all x in M, with 
the interpretation that when the procedure 6 is used and the observed value of X 
is xo then the decision that the true distribution of X is p(x, a) is to be made with 
probability 6.(1%), a = 0, 1, --- , N. The loss associated with the decision a 
when the true value of the parameter is u is expressed by a loss function W(u, a) 
which, for each fixed value of yu, is a convex function of a with W(y, ») = 0 and, 
for a between uw and 8, W(u, a) < W(y, 8). 

The following regularity conditions are imposed upon the function p(z, yz). 

Condition 1. ply, u)p(x,v) < p(x, u)ply, v) if and only if p(y, u)p(x, v) and 
p(x, u)p(y, v) are not both zero and x < y, mu < v. 

Condition 2. If p(x, v) = 0 for all x in M then p(x, u) = 0 for all x in M either 
for every w S&S v or for every wp = v. 

Condition 3. If M = (1%, 21, +** , Xn), Xi-r < 2%, then for every 1,0 <i Sn, 
there exists an integer yw; such that p(xi1, wi) > 0 and p(x; , ui) > 0. 

Conditions 1 and 2 are essentially a more precise way of saying that the likeli- 
hood ratio p(x, v) / p(x, uw) is a strictly increasing function of x whenever uw < v. 
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A simple but useful consequence of Condition 1 is the following 

Lemna 1. [f the distribution p(x, u) satisfies Condition 1 and if ply, a) = 0 and 
if there exists a pair z, 8 such that p(z, a) > 0 and ply, 8) > 0 then either 

(i) p(y, w) = O for all w S aand p(x, a) = 0 forall x = y, or 


(ii) p(y, uw) = 0 for all x 2 a and p(x, a) = 0 forallx S y. 


3. A Karlin-Rubin Complete Class Theorem. A general approach to decision 
problems involving distributions with a monotone likelihood ratio has been 
developed by H. Rubin {1] and 8. Karlin and H. Rubin [2]. Since the finite action 
problem posed here represents a special case of the Karlin-Rubin problem, a 
direct application of their results concerning compteteness of the class of mono- 
tone decision procedures gives the following 

THeoreM 1. Let C be the class of decision functions such that 

(i) for every x in M there exists an integer az such that 64,(x) + 6a,41 (%) = 1 

(11) 6a(2) > O only if p(x, a) > O 

(ili) ifa < y then & = azba,(x) + (az + 1)ba,4:(4) S ay 
If p(x, w) satisfies Conditions 1 and 2 and if, for each fixed p, the loss function 
W(u, a) is a conver function of a with W(u, nw) = 0 and, for a between u and 8B, 
W(u, a) < W(u, 8) then the class C is complete. 

The theorem remains valid under weaker conditions on the loss function’; 
however, in what follows only convex loss functions are considered. 


4. Admissible procedures when W (yu, a) = | a — yw * for large k. The class C 
may, under the hypotheses of Theorem 1, contain inadmissible procedures. 
This is effectively demonstrated by the special case where W (yu, @) is a convex 
function of |@— yw) and increases very rapidly with |a— yw). W(u, a) = 

a — » |‘ is one example of such a loss function and, clearly, any convex function 
W(|a@— pn!) with W(O) = 0 can be dominated by K| a — u | by choosing the 
constants K and k sufficiently large. The most stringent requirements for admis- 
sibility are then encountered when the range of x for which p(x, u) > 0 is inde- 
pendent of yu; in particular, 

THEOREM 2. /f p(x, uw) satisfies Condition 1 and p(x, w) > 0 for all integer pairs 
(x,p) such thatOSxSin,0S uN, and p(x,u) = 0 otherwise, then there 
exists an integer k, > 0 such that if W(u,a) = \a—pu andk = hk, then every 
admissible procedure is of the form 


6.(z) = lforz < y 
daly) + basily) = | 
8.4;(z) = lforz > y 


IIA 


whereO Sy Sn,O aN. 


2 The author proved the theorem as it is stated and a referee pointed out that the Karlin 
Rubin theorem for the finite action problem includes this result 
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Proor oF THEOREM 2. The conclusion is obtained by showing that, under 
the hypotheses specified, every Bayes solution has this form when k is sufficiently 
large. Let & = (&,&,---,&w) be an a priori distribution on the parameter 
space and let r(é, 6) be the integrated risk of the procedure 6. Then 5 is said to be 
a Bayes solution relative to ¢ if inf, r(é, 8) = r(£, 6°). For every &, however, there 
exists a non-randomized Bayes solution; consequently, if 


r,(t,a) = > la—up ‘n(a, ut, 
then 


r(&, 5°) = z. inf, r,(&, a). 


The function r,(£, a) is seen to have the following properties: 
I: Ifr.(t,a) © r(t,a + 1) thenr,(t,a + B) <rit,a+ 8+ 1) forall B >0 
IT: If r,(é, w) S r,(&, a+ 1) then r.(&, a) < r(ft,a+ 1) forallr < y 
TIT: If ro(&, a — 1) S ro(é, a) then r,(§, a) < r,(E,a + 1) for all k = | 
Let Ai(a) = (a + 1)‘—a* then r,(t, a) S r.(t,a + 1) is equivalent to 


N a 
z Ak (u = = 1)p(z, w)é, < Zz Arla — u) pla, uw), . 


p= 0 


Then property I follows from 


N s 
7 Adu —-a-—-B- 1) pla, Mey > Aku - a 1) p(x, u)&, 
+B+1 p=atl 


(2) ro 
sda 


u=0 


a+s 
(a — p)plx, wt S > Ar(a + 8 — u)plz, wk, 
w= 


for all 8,0 <6 SN —a—1. Since p(z,n) >0,0 SF rSn,0 Su SN, then 
either the first or last inequality (or both) of (2) is strict for k > 1. Property II 
is a direct result of the restrictions upon p(x, u), for if r,(é,a@) S r,(&,a + 1) 


then, since p(x, u) satisfies Condition | and p(z7,n) >0,0SrSn,0S5uSN, 


) ( ) 
Y Ae — a1) PIT, Hw), < < + ala-a- DES s 


uu P(x, =" p=a+l Ply, a)” 


( ( 
s wee > y) 2 oe < FA — yp) PAX, w) Ee, 
u=0 p(y, a) p(x, a) 


(3) 


for all x S y, with either the first or last inequality (or both) of (3) being strict 
for all k. Property III is derived by noting that if ro(&,@ — 1) S ro(é, a) then 


a—l 


. (O, ws) ~ y(0, w) 
ale~Dh 2 ~ 7. Ale~ 1+ 92 lg ~- @ oe 2 
Ka~ Des » as #) 50,0) * + a Aika — a) 0,0) * 


and 
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Ax(a — 1) 
a — (r,(&,a + 1) — ralé, a)) 
p(n, 0) 


a—l 
; , (O (O, us) 
= Ax(a) (- > Ayla — 1 — p) pwr) & + > Ar(u — a) =F i) 
u=l 


p(0, 0) = p(0, 0) 


a N ( 
+ Arla — 1) (> Arla — p) p(n, H) &,— 7. Ar(u — a — 1) P mH) t) 
p= p(n, 0) p=a+l p(n, 0) 
(4) 


1 
(O, uw) 
7 * (aul — 1)Ax(a — p) Pm BI — Ax(a)Ar(a — p — 1) pe )e 
= p(n, 0) p(0,0) 


(0, a) 
A, ( _ 1) PX @) ( pd, )s 
— ( tla p= 0) + Ax(a) »(0,0)) *" 


N , 
(0, 
+ > (arla)Arts — a) 2 » ale — idles — a = 1) OM he. 
pO 0) p(n, 0) 


For OS uSa-—1, Ac(a — 1) Ala — w) = Ar(a)Ar(a — uw — 1) and, by 
Condition 1, (p(n, u)/p(n,0)) > pO, u)/p(0,0)) so the coefficient of & in (4) 
is positive for all » S @. And since, for uw > a, (Ar(a)Ar(u — a)/Ay(a — 1) 
A.(u — @ — 1)) can be made arbitrarily large by choosing k sufficiently large 
then k,(u, a) may be defined as the smallest integer k such that 


peatl 


Ax(a)Ax(u — a) p(n, u)p(0, 0) 
Ar(a — 1)Ar(p — a — 1) p(n, 0)p(0, #) 


Hence, for k 2 max, k,(u, aw) the coefficient of & in (4) is positive for all uy > a, 
and property III is thus established by taking 
ky = max k,(y, a) 
expe 
Now rl a: ; be an integer such that miles a<w T2(€, a) = r,(t, af). Then for 
a < y, a < af. For suppose x < y and af < af ; since r,(€, af) S ry(¢, af + 1) 
then, by II, r.(é, af) < r.(é, af + 1) and then I implies the contradiction 
r(t, a& — 1) <1,(t, af). For k= k,, III gives ra(t, af + 1) < ra(é, ab + 2) 
which implies by II, that r.(é,a§ + 1) < r-(é, a5 + 2) for all x S n and this 
implies, by I, that r.(t,a§ + 8) < r-(,a5 + 8+ 1) for all 6 > 1. Hence, 
a < af < ab + 1 forall z. If there exists a value of x such that af = ab + 1 let y 
be the least such z, then 0 S y S n and 


fag forz <y 
as + lforz = y; 


az = 


in this case, randomized Bayes solutions exist and are of the form 
6.¢(x) = lforz <y 
bat(y) + datyily) = 1 


Satyi(x) = lforz > y. 
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Since every admissible procedure is a Bayes solution this completes the proof of 
Theorem 2. 
The distribution 


(5) P(x, u) = (")( - —— : Osztsn,0 Sus N 
x ] 1 


satisfies the hypotheses of Theorem 2 for 0 < x <n, and since Theorem | 
applies for z = 0, n then 

Coro.tuary. If p(x, uw) is the distribution (5) then when k = k, a procedure 6 
is admissible only if 


6.(0) + 644:(0) = 1 
6a(x) = lforO <ar<y 

ds(y) + Ss4i(y) = 1 
bsui(x) = lfory<ar<n 

6,(n) + b,4:(n) = 1 


where the integers a, 8, y salisfyO Sa<B<yEBN. 


5. Admissible procedures when W(y, a) = | a — yw}. If Cy denotes the class 
of procedures which are admissible when W(y, a) = | a — yu \* then C, is con- 
tained in the class C of Theorem 1. As demonstrated by Theorem 2, however, 
when k is sufficiently large the class C, may reduce to a collection of procedures 
which virtually designate the same decision for all values of z, so in this case 
little significance could be attached to the mere fact that a procedure belonged to 
the class C. Since W(u, a) = | a — uw |* is a conventional type of loss function 
for estimation problems Theorem 2 therefore raises a question of the practical 
importance of the class C; hence, it is of special interest that 

THEOREM 3. [f p(x, u) satisfies Conditions 1 and 2 and if the sample space M 
is finite then the class C, of procedures which are admissible relative to W(yu, a) = 

a — p| ts the class C itself. 

Proor or THEeorEM 3. If a member 6 of C is inadmissible then, since C is a 
complete class, there exists a member 6’ of C which is better than 5. Then for all 
possible u 


(6) r(u, 6) — r(u,8’) = Dop(a,p) (ja, —w|— la. —p)20. 


Theorem 3 is proved by showing that (6) cannot hold for all possible yu; and, in 
particular, that there exists x in M such that either 


r(az,6) > r(az,6’) or r(az+ 1,6) > r(a, + 1, 8’). 


Only the ordering of the sample space is pertinent so, without loss of generality, 
let M = (0, 1, --- , nm) and let | a, — &| = A, > 0 for all x in M. Let 
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min,(z | & < &,) 
max,(z | &, > &-) 
(x|\d <x <eand a “Gs & =<. a.) 
(a, Yr, °° yYma)s Yi < ys fort <j 
(x|d <x <eand &) < &-1 S & < &,) 
= (2,22, °** 2m), 21 < 2; fori <j 
then let yo = d, ym = e, and 
ai = Vart+ 1 = y, fori =0,1,--- 
= V9; + l= 2i4+1 for7 = 0, L. 
= a,,, fort = 0,1,---,m— 1 
Mei+t = Oe,;,, + 1 (or as,;,, if ae.,,, = Hoo,.1) 
fort = 0,1, --- 
Since 6 and 6’ are in C then a, < &, and &, S @, for all x < y so 
Uo: S vai < Usar S Vaia1 for? 
Mo < Moi-1 Mei Moi+1 for? 


and for 2k—1 < q S 2k 


uo—l Ms 


2k—1 
t (ue, 8) — r(ug, 8’) = — Dd plz,u.)A, + D (-1)' > plx 
z=0 i=0 z=u, 


2m 


1 vi 
_ 7 (—1)' > p(x, wg) Az 


i=2k 


bi(ug) = z. p(x, bg) Az 


2(i+7)-1 


Bila) = Do (—1)'bi(ug) 


t=21 


then, since A, > 0 for all z, 
r(uy , 6)—r(u,, 8’) S Box(ug) — Br m—« (ug). 
A contradiction to (6) is then obtained by showing that there exists a pair 
(k, q) such that 2k — | < q <= 2k and Bo x(q) <~ By ws k(Mg). 
Let S;(k;) be the following statement, defined for all integer pairs (7,7) such 
thatO SisSm—j,lsjsm. 
S,(k;): There exists an integer pair (k; , g(z, 7)) such that 0 S k; < Jj, 


2+ kj)—-1 S qlt,7) S$ Ai+k;), and Bix ;(ugiiy) < Biss, je; (ueci.y). 
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The negation of S,;(k;), written not S,(k;), is then 
not S.(k;): For every integer pair (k,¢) such that 0 S k S j and 2(¢ + k)— 
lsqs2i+hk) 


By x(ug) 2 Bisu,j-«(ug)- 


The desired contradiction to (6) may then be written So(k»,). 
The statement S,(k;) is easily proved by contradiction. Note first that since 6 
belongs to C then 
pur: , uri) > 0 fort = 0,1,---,m—1 
(7) 
P(V2i-a » Mai 1) > 0 for [= 


so that 


(8) bin) (E) 0 or (7); 


If be:4;(u2:) > O then there exists 2; = Uei4; such that p(x , w2:) > 0, and since 
p(uei, wei) > O then, by Lemma 1, 


(9a) if bois (ue) > O then p(z, wei) > Ofor ua, S x u 


2i+) 


Similarly, if be:(u2i4;) > 0 then there exists x» S ve; such that p(2xo , wei+;) > 0. 
By (7), however, there exists 2; 2 voi; such that p(x , wei4;) > 0; hence, by 
Lemma 1, 


(9b) if boi(uoi4;) > Othen p(z, wei+;) > Oforve, S x 


It then follows that 


(10) if By i(u2:) SO then By s(ueiz;) (S) 0 for 7 (5) ] 


The statement in (10) is easily seen to hold for all j 21 such that 
either bei(uoi4;) = 0 or beisi(ueis;) = O, for if bei(uei4;) = 0 then (8) implies (10) 
and if be; 41(moi4,;) = 0 then p(tei41 , wei4;) = O but, by (7), there exists 2, > vais 
such that p(x , wei4;) > Oso, by Lemma 1, p(2, wois;) = Ofor all x S uei4; and, 
in particular, be:(u2.4;) = 0. Now suppose that both be;(ue2i.;) > O and bei41(ueis;) 
> 0 but that (10) does not hold; in particular, suppose be;.:(u2:) 2 bei(ue;) and 
bo(uei4;) 2 beias(uoi4;). Since, by (8), bei(ue:) > O then bei4:(u2:) > O and, by 
(9a), p(x, ue.) > O for w; S x S Uei4s ; and since bei(u2:,;) > O then, by (9b), 
P(x, wai4;) > O for va S x S ui4;. Then, by Condition 1, 


be (wei) boi(uoi4,) - 
P(va, ’ bei) P( V2 >» Ba +i)" 
but then the assumption that be:(ueis;) 2 beigs(uei4;) implies 


9:41 (us) bois1(uai+s) 
P( v2: ’ Mei) P(v2i ’ Mei4;) E 


which contradicts Condition 1. Hence (10) holds for all 7 2 1. The statement 
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not S,(ki) implies that B;:(u2:) S 0 and B,,(wei41) 2 O and is therefore a con- 
tradiction of (10); hence, S;(k;) for all 7 such that O S 7 S m — 1. 

Now suppose S,(k;) for all (7, 7) such that 0 S i S m—j,1 Sj <s S mbut 
not S,(k,), where 0 S h S m — s. Then (hk; , g(h, 1)) can be chosen as (0, 2h); 
otherwise By,:(u2n) S 0 and, since 2h < 2(-h+1+h1) —1 S qh+1,s — 1), 
then, by (10), Basr(eoassey) S O which, together with the assumption 
Srai(k,1) implies the contradiction S,(k,). 

If for all 7 < g < s, (k; , q(h,7)) can be chosen as (0, 2h) then (k, , g(h , g)) can 
be chosen as (0, 2h). Otherwise, By,.o(u2,) S 0 and, since S,4,(k,-,) but not 


Silks), Brio(uon+e.e—0)) > 0. And since p(s, , un) > O then p(x, uw») > O for 
Un S xX S Vonsy)-1 ; Otherwise, by Lemma 1, p(x, wx.) = 0 for all = venga, 
and then (kj; , g(h, g — 1)) = (0, 2h) implies that (k, , q(h, g)) can be chosen 
as (0, 2h). Also, p(vanstoy, Mocntoe-o)) > 0; otherwise, by Lemma 1, 
P(r, Meth+ge—9)) = O for all © S vanyg-1) SINCE Vo—nzg—1) < Vochgg+k,—u)—1 
< Uonso+k,_.) > and then By o(ugianige—g)) LS O to contradict not S,(k,). Hence, 
(11) — _Brroralun) 5 By, o1(u2n) 


P(V2(r+0—1) » Mon) 7 P(V2(K40—1) » Hen) 


and 


Br+o—1.(Hair+o.0—0)) 


(12) _ Bro-(wartou-o) > 


P(V2n+0-1) » Mathto.e—0)) PC V2cn+0—1) » Math 0.2—0)) 


Observe, however, that if 


: (10 
(13) By lum) __ Bails) 
P( Venez) » Ber) P(en4 ij) » Ma) 


where 1 S j S g — 1, 2h < q, and p(z,u,) > 0 for rari S FS veny) then, 
by Condition 1, 


(14) By +3.1(on) % Busi (ua) 
P(Vach+i) » Men) P(V2n+5) » Ha) 
and, since By, ;(u) + Basja(u) = Br, j4i(u), 


By i+1(uen) By, i+1(ug) 


(15) ——— -. 
P(r +j)9 Mon) P(van4 Jd)» Ma) 


Since (kj4: , g(h, 7 + 1)) can be chosen as (0, 2h) then By, ;4:(u2,) > 0, and since, 
by Condition 1, p(v2a4j , uen)p(enes+) » Ha) > P(V2n+5) » Ha) P(2n+541) » Men) then 


By,,i+1(uon) oe plz had) 6 Bon ) ; Bn.j41(uen) 


P(V2n4542) » Mr) P(van4i41) » Men) P( Vang) » Men) 


(16) 
D(e2043) > Ha), Brittle) _Br,s+i(ua) 
P(V2n45-+41) ’ Ma) D( Var +j) 9 Mg) P( Vr +j+1) » Ha) 
Thus, if (13) then (16). Now let 7’ be the least 7, 1 S 7 S g — 1, such that 
P(V2(h+3) 5 Moth+g.e-g)) > O. Then (13) holds for 7 = 7’ and q = q(h+ 9,8 — g), 
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for if 7’ = 1 and pv, uorrigu—o)) > 0, then let 7 = O in (14), (15), (16) to get 
the desired result; otherwise, if 7’ 2 1 and p(vx , ugn4o.e-)) = 0 then the right 
side of (13) is nonpositive while the left side is positive since (kj, , g(h, j’)) can 
be chosen as (0, 2h). Hence, by finite induction, (13) holds for 7 = g — 1, 
q = qh+g9,8 — g);ie., 


Br o1(us) Br oa(arsoe—0) 


P(V2(n+0—1) » Men) P(V20n4 g—1) » Ha(h+g.2—g)) 


Hence, by (11) and (12), 


Baro 1,1(jt2n) ion Brso alga g)) 


P(V2A40- 1) » Man) P(V2n40-1) » Bath+o.2—v)) 


in contradiction to Condition 1. This proves that if S,(k;) for all (7,7) such that 
0Osism-—j,l Sj <s Sm but not S,(k,) then (/;, g(h, j)) can be chosen 
as (0, 2h) for all 7 such that 1 S 7 < 8 < m. 

With this result, however, simply take 7 = s — 1, q = 2(h + s)—1 in (13) 
to get 


By, »—-1 (un) Mei Sn By »—1(uacn4e)—1) 


P(Va(n4e- 1) » Men) P(Ve(n4e—1) » Mo(h+-s)—1) ; 
The denominator p(vea4s—1) , Men+e)-1) Must be positive; otherwise, since 
P(Voin40)-1, Henge) > O then p(x, wonseyr) = O for all  S ve44s.-1) so that 
Bye (uensey a) = —densey—a(ueneya) < 0 to contradict the assumption not 
S,(k,). Then not S,(k,) gives, as before, 
Bas 1.1 (won) i Bhs 1 (won) 
p( vs h+e—1) 5 bon) ~ p(vs h+e—1) » Mon) 


By 1(Macnts) 1) is Bre 1,1(Mon+e)—1) 


- P(V2n4e—) ’ Mo(h+«)—1) 


. P(V2n4+0 -1) 5 Moih+8)—1) 


to contradict Condition 1. Hence, S,(k,). And since S,(k;) for all ¢ such that 
0 < is m-— 1 then, by finite induction, S;(k;) for all (7, 7) such that 0 S 7S 
m—j,1 S73 S m. In particular, So(k,,), which establishes Theorem 3. 


6. Minimax procedures when W(yu,a) = |a—yw\. When the minimax 
estimator does not have constant risk, as is obviously the general case here, 
then the Bayes method of finding the minimax procedure by guessing a least 
favorable a priori distribution becomes extremely difficult, if not hopeless. For 
distributions of the type considered here, however, it is possible to reduce the 
problem of guessing a least favorable a priori distribution to one of guessing 
which points in the parameter space are assigned positive probability by a least 
favorable a priori distribution. 

TuHroreM 4. Jf p(x, ») satisfies Conditions 1, 2, and 3 and W(u, a) = |a — pw 
then there exists a least favorable a priori distribution which assigns positive prob- 
ability to at most n + 2 values of u, wo S wi S -+- S unas, and if 6 is a Bayes 
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solution with respect to a least favorable priori distribution then wi S &, S wiys for 

=, 1, 

Proor oF THEOREM 4. Assume, without loss of generality, that 
M = (0,1, --- , n). Let 

}a — p| pla, wé 
p= 0 
and let af be the collection of integers 
ai=(a,0OSsaZN | infg r,(&,8) = ri(t, a). 


From the proof of Theorem 2 the function r,(, a) has the properties 
I’: if r-(&, a) S r(t,a + 1) thenr,(~,a + 8) S r(t,a+ 8+ 1) forall B= 
II: if r,(é, wa) S ry(é, a + 1) then r,(t, a) < r(t, a+ 1) forall az < y 


£ e 
Hence, a: has the form 


g g g E g 
az = (az ,az + 1,°°- , az + Bz) 


n £ , , £ £ . 
where 05 a; SN, 05 <pi<N—oi, and af + B < af,,. Furthermore, 
° £ ° ° 
since r,(é, a®) = 7,(€, ai + l)=--- =7,(f,a: + Bs) or,forz = 1, ---: Be — |, 


a}+i-—l 


p(x, af + ities + . Pix, we = 2 plx, dé, 


“ as+i+l 


and 


N a$+i-l 


Zz pz, wk = >, p(x, w& + pla, af + I) Eas 
p=al+i+l1 u=0 


then p(a, ak + t)fat+i = Ofor0 <i < B, . Hence, since r,(€, at — 1) >r,(é, af), or 
Ny af-1 
£ f 
pa, of)Eag + p(z,af+ Bair t LD plewh> D p(x, ws 
pmaf+pg—1 sO 
and r,(é, a& + Bf) < r,t t, a8 + BE + 1), or 
N 
Zz p(x, ut, < > pla + plz, as)Eat + p(x, af + Be)Eat+s; , 
w= a$+8§+1 u=0 
then p(z, az )Ea§ > 0 and p(x, af + B) JEag+ag > O. i were since p(x, af) > 0 
and p(z, a + at) > Oimply, by Lemma 1, that p(z, a+i)>Ofor0<i< Be, 
then 


> Ofori = 0 
(17) +i) = Ofor0 <i < ZB; 
> Ofori = 8: 


If € is a least favorable a priori distribution; i.e., if £’ maximizes inf; r(é, 4), 


S 
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then since, for a fixed 6, r(g, 6) is linear in £, every & which satisfies 
(18) r,(t, af — 1) > r.(t, af’) = --- = r,(t, af + Be’) S r.(é, af + Bf + 1) 


for x = 0, 1, --- , n is likewise a least favorable a priori Giateination. But since 
p(x, w) satisfies C ‘onditions i. 2, andj 3, and, for overy z> 0, o(z—-— 1, af. it pe 1) 
> 0 and p(z, af’ ) > O, where at. on a < at ; the n forevery x > Othere 
exists an integer yu, such that af. os pe, Sie sat and both p(x — 1, uz) 
> 0 and p(z, uz) > 0. Let (uz), x = 1,2, +--+, nm, we S wes, be a sequence of 
such integers and define uy = af and ati = at, + B® . Then every — which 
satisfies 











(19) r-(&, uz) = r2(€, uz +1) = = r,(f, 241) 





for x = 0,1, ---, also satisfies (18) and has & = O for wz < uw < wea: for 


It remains, then, to show that a solution £’ to (19) exists and may be chosen 
so that t) = 0 for wu < yo and for w > pas. This, however, follows directly 
from Theorem 3, for the problem of proving the existence of such a £’ is easily 
seen to reduce to the problem of proving that a set of equations of the form 


> p(zjwt& = Dd p(x, w&,x =0,1,---,m—1 


p—0 per+l 






where p’(x, w) satisfies Conditions 1 and 2 forz = 0,1, ---,n 2m—1l, p= 


Sb ++, m, and p'(x, x) > Oand p’'(z, x + 1) : 0, has a solution & = (&,) such 
that & > 0, wu = 0,1, ---, m, and >a « — = 1, and this may be viewed as a 
special case of Theorem 3 with N = m and n , m — 1. Theorem 3 then asserts 
that a procedure 6 with 6,(7) + 6:4:(7) = 1, 6,(27) < 1, forr = 0,1,---,m-— 1 
and 6,,(2) = 1 for x 2 m is admissible, and therefore 6 is a Bayes solution rela- 
tive to some a priori distribution € and, by (17), & > 0 for uw =0,1,---, m. 
Hence, a ¢’ of the desired form exists and the theorem is established. 

The construction of the minimi ax procedure 6 is easily accomplished once the 
integer uw, is known for every zx. 8 is defined by (a) ,&, -++ ,&,) whichis uniquely 
determined by the equations 


r(uz,6) = > P(Y, Me) (Me — By) + : p(y, uz) (& —u) = r(uo, 8) 





7. Discussion. The requirement that an estimator of an integer-valued 
parameter must itself be integer-valued is almost a logical necessity in any 
rigorous approach to the estimation problem. For practical purposes, of course, 
such a requirement has been regarded as an unnecessary refinement, and statisti- 
cians conventionally estimate an integer-valued parameter by means of a real- 
valued statistic, presenting as their estimate either the real number itself or the 
nearest integer. The problem is frequently encountered, for example, in such a 
form that the statistician wishes to present an estimate of the fraction u/N. 
Certainly, division by the known constant N is a trivial alteration of the estima- 
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tion problem; it would be unheard of, however, to require in this case that the 
estimate assume one of the values 0/N, 1/N, --- , N/N. 

If real-valued procedures are allowed then when loss is absolute error the 
randomized, integer-valued procedure 6 is equivalent to the non-randomized 
procedure which estimates the real number a, when z is observed. Any optimum 
property ascribed to an integer-valued procedure therefore applies to its real- 
valued counterpart so, as a corollary to Theorem 3, when real-valued procedures 
are allowed then the class of non-randomized real-valued procedures derived from 
the class C in the above manner is a minimal essentially complete class. Likewise, 
if 6 is the minimax integer-valued procedure then the non-randomized real- 
valued procedure & is also minimax. Theorems 3 and 4 thus remain essentially 
unaffected by the introduction of real-valued procedures. 
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MAXIMUM-LIKELIHOOD ESTIMATION OF PARAMETERS 
SUBJECT TO RESTRAINTS 


By J. ArrcHISON AND S. D. SILvrey 
University of Glasgow 


Summary. The estimation of a parameter lying in a subset of a set of possible 
parameters is considered. This subset is the null space of a well-behaved function 
and the estimator considered lies in the subset and is a solution of likelihood 
equations containing a Lagrangian multiplier. It is proved that, under certain 
conditions analogous to those of Cramér, these equations have a solution which 
gives a local maximum of the likelihood function. The asymptotic distribution of 
this ‘restricted maximum likelihood estimator’ and an iterative method of solv- 
ing the equations are discussed. Finally a test is introduced of the hypothesis 
that the true parameter does lie in the subset; this test, which is of wide appli- 
cability, makes use of the distribution of the random Lagrangian multiplier 
appearing in the likelihood equations. 


1. Introduction. Quite frequently in statistical theory the natural way of 
building up a mathematical model of an experiment leads to the description of 
the experiment by a random variable X whose distribution function / depends 
on s parameters 6; , 02, +--+, 8, Which are not mathematically independent 
but satisfy r functional relationships h;(@, , 62, ---, 0) = 0,71 = 1, 2,---,7, 


r < s. In many cases where such a natural description arises it is possible to 
solve the r equations h;(@; , 62, --+ , 6.) = 0 for r of the parameters in terms of 
the remaining s — r, to express the distribution function F in terms of these 
remaining parameters only and, given observations on X, to estimate these 
s — r unrestricted parameters by the method of maximum likelihood. This 
procedure has two disadvantages. First, it may be impossible to express r of the 
parameters explicitly in terms of the remaining s — r and second, interest may 
lie in estimating all of the parameters simultaneously, in which case a sym- 
metrical procedure for so doing is certainly desirable. The natural symmetric 
method for maximum-likelihood estimation in this case is achieved by the in- 
troduction of Lagrangian multipliers and it is this method that we will consider 
in this paper. 


2. Formulation of the problem. In this section we will formulate more pre- 
cisely the problem to be considered. 


We will denote m-dimensional Euclidian space by ®”, m = 1, 2, 3,---. 


A point in &’, denoted by 6 = (0; , 0, +--+ , 6.) will represent a value of a param- 
ry . . . (0 0 0), « . . 
eter. There is a particular point 6) = (6{°”, 0°, --- , 6S°) in @* which is the true, 


2 3 


though unknown, parameter value. Corresponding to each @ in some neighbour- 
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hood of 4, say in U, = {0:6 — || S a}, is a probability density function 
fe defined on &' and we will denote the value of fe at the point fe &’ by f(t, 6). 
The probability density function f, defines a probability measure on ®' and we 
will assume that, with respect to this measure, for almost all /, the partial deriv- 
atives 0 log f(t, 0)/00;, 7 = 1, 2, --- , 8, exist for every @in U,. 

There is given a continuous function h from ®&* into ®', r < s, defined by 
h(@) = (hy(@), ho(8), --- , h-(@)), which is such that, for every 6 in U, , the par- 
tial derivatives dh;(0)/00;,i = 1,2, ---,s,j7 = 1,2, ---,7r, exist. The function 
h has the further property that h(@) = 0. 

A point in ®” denoted by x = (x, x2, +++ , tn) will be regarded as represent- 
ing a set of n independent observations on a random variable whose probability 
density function is f,, and we use the fact that points in ®” are being so regarded 
to define, in the usual way, a probability measure on &", for each n. Subsequent 
statements regarding the probabilities of sets in ®" will refer to this particular 
probability measure. 

It will be convenient to use also matrix representation for points in ®” and 
for linear operators from one Euclidian space to another and we will use the con- 
vention that, for example, 6 is the s X 1 column vector representing the point 
é in ®’, and H, an s X r matrix, represents a linear operator H from @&’ into &’. 


The log-likelihood function L is defined on a subset of ®” & & by 


L(x, 6) = > log f(xi, 9). 


i=l] 


If H, denotes the s X r matrix (dh;(0)/00,;), and if \ is a Lagrangian multiplier 
in ®’, then we propose to estimate the unknown parameter 6) by a solution, if 
such exists, of the equations 


(x, 0) + HA = 0 
h(@) = 0, 


where f(x, 6) is the point in ® whose 7th component is 0L(x, 0)/00; . 

We will show that, under certain fairly general conditions, if x belongs to a 
set whose probability measure tends to | as n — ©, these equations have a 
solution 6(x), (x), where 6(x) is near % and 6(z) maximises L(x, @) subject to 
the condition h(@) = 0. The definition of 6 and i will then be extended in a 
natural way to the whole of ®” and we will show that the random variables thus 
defined are asymptotically jointly normally distributed. We will then consider 
an iterative procedure for solving the equations (2.1) and (2.2). Finally tests of 
the adequacy of the model will be introduced. 


3. Existence of a solution. The proof that we will give of the existence of a 
solution of the equations (2.1) and (2.2) is based on the same principle as a proof 
given by Cramér [2] of the existence of a maximum likelihood estimate of a 
parameter in &'. However the presence of the restraining condition h(@) = 0 in 
the situation we are discussing makes our proof more intricate in detail than a 
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straightforward generalisation of Cramér’s proof to a parameter in ®’ would 
be. And we start by indicating the main lines of the proof. 

We set out to show that, under certain conditions, if 6 is a sufficiently small 
given number and if n is sufficiently large, then, for a set of x whose probability 
measure is near 1, the equations (2.1) and (2.2) have a solution 6(z), \(x), where 
A(x) ¢ Us. We will demand that in U, the function log f(z, -) should possess 
partial derivatives of the third order and the components of the function h 
should possess partial derivatives of the second order. Then it will be possible, 
by expanding the components of f(z, 6) and h(@) about @ to express the equa- 
tions (in matrix notation) in the form 


(3.1) I(x, 80) + Mz, 6,(@ — 0) + v(x, 6) + Ha = 0, 
(3.2) Hy, (6 — ©) + v(@) = 0, 
where 

(i) M.,e, is the matrix (L(x, 6) /00,00;), 

(ii) v'’(z, @) is a vector whose mth component may be expressed in the 
form 3(@ — 6)’L,,(@ — ®), L,, being the matrix (@°L(z, 6°")/00,,00,00;), 
i,j = 1, 2,---, s, and 6°"'” a point such that | a" — 6 |! < || @ — % 

(iii) v(@) is a vector whose mth component is }(@ — 60)'H,,(@ — ®), H,, 
being the matrix (0°h,,(6'""”)/a0,00;), i, j = 1, 2,---, s, and 6”'” a point 
such that || 0"? — @ |) < | @— @& 

Further conditions imposed on f, which are almost a straightforward generali- 
sation of Cramér’s conditions [2], will ensure that, for large enough n, there is a 
set of x whose probability measure is near 1 such that, if x belongs to this set, 

(i) | (1/n)é(x, 4) || is small, 

(ii) —(1/n)M,, », is near a certain positive definite matrix Bs, and 

(iii) the elements of (1/n)L,, are bounded for 6 ¢ Us. By dividing (3.1) by 
n we will then be able to express this equation in the form 


(3.3) —B,,(0 — Oo) + . Hea + bv" (x, 6) = 0 


where || v(x, 6) || is bounded for @ ¢ U;. In addition we will demand that, for 
6 ¢ U, , the second order derivatives of the components of h should be bounded. 
Then we will be able to express (3.2) in the form 

(3.4) Hy,(@ — ) + dv(e) = 0 


where || v“(6) || is bounded for @ ¢ Us. 

If the equations (3.3) and (3.4) have a solution, then by pre-multiplying (3.3) 
by H,,Bs, and substituting for Hy, (0 — @) from (3.4) we find that the values 
of 6 and J satisfying these equations also satisfy an equation of the form 


3.5) H,, BH. (: x) + sv (x, 6) = 0. 


» — a . ‘ ’ 1 . 
We will impose conditions on h which ensure that the matrix H»,Be, He is non- 
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singular and the elements of its inverse are bounded functions of @ for 6 ¢ U;. 
Then it will be possible to solve equation (3.5) for \ in terms of @ and on sub- 
stitution in (3.3) we will obtain the result that any value of @ in Us for which 
equations (3.3) and (3.4) are satisfied is also a solution of an equation of the 
form 


(3.6) —Be,(® — 0) + dS v(x, 0) = 0 


where || v(x, @) || is bounded for 6 ¢ U; 

Conversely it will be shown that if the equation (3.6) has a solution 6(x) ¢ Us 
then 6(x) leads to a solution 6(x), \(x) of equations (2.1) and (2.2). We will then 
use the fact that Bs, is a positive definite matrix to prove that, if 6 is sufficiently 
small, (3.6) has a solution in U;. 

This outline of the method of proof to be adopted provides the motivation for 
the introduction of conditions on f and h which we now discuss. 

Conditions on f. The following conditions on the function f appear complicated 
and restrictive from the mathematical point of view. In fact they will be satis- 
fied in most practical estimation problems. 


¥1. For every 0 ¢ Uq and for almost all t ¢ &' (almost all with respect to the 
probability measure on &' defined by fo,), the derivatives 


4 log f(t, 6) a log f(t, 0) = 0° log f(t, 0) 
—, - and ——— , 
00; 00; 00; 06; 00; 00; 
exist, and the first and second order derivatives are continuous functions of 8. 


§2. For every 6¢ U, and for i, 7 = 1, 2,---, 8, | of(t, 0)/00;| < F,(t) and 
0)/00,00; | < F2(t), where F; and F2 are finitely ye le over (— 2, ~), 


. For every 6¢ U, and i, j, k = 1, 2,---, 8 * log f(t, 0)/00,00,00, | < 


’ J k | ~ 


F ae where es wl”3(t)f(t, 0) dt is finite and equal a , say 


+4 i | ” 4 log f(t, 6) & log f(t, ) 
y= . 


> : f(t, 69) dt 
00; 06; 7 


\s finite for 1,7 = 1, 2, --- , 8, and the matrix Be, = (b:;) is positive definite with 
minimum latent root po . 


The conditions $3 and 44 are apparently less stringent than a straightforward 
generalisation of Cramér’s corresponding conditions would be. In $6 we return 
to this point. 


If f satisfies these conditions then for any given positive numbers 6 < @ and 


« < 1 and for sufficiently large n, say n 2 n(é, ©), there exists a set XY, C @” 
with re properties 


1. Pr {X,} >1—«. 


ae : : 
(x, 0) || << 6, ifee X, 


, 6 can be expressed in the form —Be, + 6m,., , 
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where m, 4, is an s X s matrix the moduli of whose elements are bounded by 
L.t2#e 2. 


4. For every 6¢ U, and i, j, k = 1, 2 


» @)* 


1 &*L (a, 8) | 
| n 80; 00; 30, | 


[ 2k, 
iz é Ba. 


The proof of these results is similar to the proof of the corresponding results 
given by Cramér [2] in the case of a parameter in ®' and we merely remark 
that the conditions ¥1—4 imply (as they are designed to imply) that 

(i) (1/n)0(-, %) converges in probability to 0 ¢ @’, 

(ii) (1/n)M. », converges in probability to —B,,, and 

(ii) if G(r) = 1/n>-?. 1 F;(z;), then the random variable G converges in 
probability to «, and 


1 #L(x, 0) 1 A log f(x, 8) 


= < Giz), 
n | 00; 00; 00, nN \i=1 00,00; 00, <oe 


by 53. 

In future when we refer to a set X,, we imply that v is sufficiently large for the 
existence of a set in ®” with the properties X1-4 and that the set X, referred 
to has these properties. 

As has already been indicated, one of the main purposes of the introduction of 
the conditions § was to ensure that (3.1) could be expressed in the form (3.3). 
Now if the conditions F are satisfied, if x « X, and 6 ¢ U;, it is easily verified 
that 

(i) by 22, (1/nd")!! £(x, ) || < 1, 

(ii) by 3, (1/8)|! m.6,(@ — O%) | Ss, 

(iii) by 24, (1/nd’)!! v'? (a, 0) || < (1/8")s°x:!! @ — Oo |! S s*x,. 
It follows that (3.1) can then be expressed in the form (3.3) and 


(3) 1 2 3 > , 
lly?(2, a)! << 1+ 8° 4+ 8%, when aeX, and 6cU;. 
Conditions on h. We impose the following conditions on the function h. 


K1. For every 0¢U, the partial derivatives dh,(0)/00;, 1 = 1, 2, - 
> = 1,2,---, 7, exist and these are continuous functions of 8. 
K2. For every 0 ¢ Ua the partial derivatives &h,(0)/00,00;, i,j = 1, 2, °°: 
k= 1,2,---, r, exist and | ah, (0) /30,00; | < 2x2, a given constant, jor all 
|, j and k, 


5x3. The s XK r matrix He, ts of rank r. 


The condition 5C2 is introduced to ensure that when (3.2) is expressed in the 
form (3.4), |! v“?(@) |! is bounded for 6 ¢ U;. It is clear that it does ensure this 
since, as is easily verified, by 3C2, || v(@) || < s*x || @ — O ||” and so || »(@) || = 
(1/8)! v(0) || < sx. if O& Us. 
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Also the condition 5C3 implies that the matrix Hy,Bo, Hy, is positive definite, 
since the matrix Bg, is positive definite. Since the elements of Hy are, by 51, 
continuous functions of @ it follows that there exists a neighbourhood of & 
in which det (H»,Bo, He) is bounded away from zero, and we may assume that 
this neighbourhood is U,. (This assumption merely involves choosing @ small 
enough initially). This means that when 6 ¢ U, we can solve the equation (3.5) 
for \ in terms of 6. Furthermore the elements of the matrix (H; B), Hp) ’ are 
then continuous functions on U, since the elements of Hs are continuous and 
det (H;,By, Hy) is bounded away from zero. Since U, is a closed set it follows that 
the elements of (Hy, Bo, He) ' are uniformly bounded on l”, . This result, together 
with the results that |v‘ (x, 6) || and |! v“(@) | are bounded on Us, enable us to 
prove that when 2% is eliminated from (3.3) and (3.4), and (3.6) is obtained, 
then in (3.6) || v(x, 6) |! is bounded on U;, if xe X,. 

We have now gone a considerable way towards proving the main part of the 
following lemma. 

Lemma 1. Subject to the conditions and SC, if 6 < aand € < 1 are given posi- 
tive numbers and if x ¢ X,,, then the equations (2.1) and (2.2) have a solution 
A(x), A(x) such that 6(x) ¢ Us, if and only if 6(x) satisfies a certain equation of the 
form —Be,(0 — %) + d v(x, 0) = 0. In this equation v(x, -) is a continuous func- 
tion on Us; and | v(x, 6) || is bounded for 6 ¢ Us by a positive number xz , say. 

Proor. The fact that the condition is necessary has virtually been established 
already. On eliminating \ from (2.1) and (2.2) by the method outlined at the 


beginning of §3 we obtain, in matrix notation, the following explicit expression 
for (3.6) 


—B»,(@ — 0) — Ho(Hs,Bo, Hs) ‘{v (6) 
+ Hy, Ba, v(x, 6)} + v(x, 8) 0, 


where 


é 2 l 
(3.8) v(x, 0) = Fv" (x, 6) = — I(r, 6) + Boe,(® — 65), 
vi 


(3.9) v6) = sv (6) = h(6) — Hy,(6 — 6) 
Hence in (3.6), 
v(x, 0) = —H,(H»,B;,H») ‘{v" (6) + 
(3.10) 
Hy, Bo, v(x, 6)} + v(x, 6). 

The fact that v(r, -) isa continuous function on U; and that | v(x, 6) is bounded 
for 6¢ U; follows from (3.8), (3.9) and (3.10), in virtue of the discussion of 
v(x, 6), v(6) and (H»,Bs'Hs) * above. 

Turning to the sufficiency of the condition we now suppose that the equation 
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(3.7) has a root 6(x) ¢ Us. Then, writing 6 instead of 6(x) for brevity, we obtain 
on premultiplication of (3.7) by H,,B,., 
(3.11) —Hy,(6 — 6) — v(6) = 0, 
i.e., by (3.9), 

h(6) = 0. 
Substitution for v‘?() from (3.11) and for v°°(z, 6) from (3.8), in (3.7) gives 
I(x, 6) = H4(Hy,Bs, Hs) “Ho,Ba,1(z, 4), 
or, if we write Q§ for (Hy,Bo, Hs) ‘Hs,Bs,, 
(3.12) I(r, 6) = HiQal(a, 4). 
If we now define 4(x) by 
(ar) = —Qél(ax, 4), 


then 
I(x, 6) + Hea(x) = 0, 


and 6(x), \(x) satisfy the equations (2.1) and (2.2). 
In order to prove that the equation (3.6) has a root in U;, if 6 is sufficiently 
small, we will require the following lemma. 


Lemma 2. /f g is a continuous function mapping G into itself with the property 
that, for every 0 such that | 6 = 1, 0’g(6) < 0, then there exists a point 6 such 
that 6'| < 1 and g(@) = 0. 


Proor. For the proof of this result we are indebted to Mr. J. M. Michael who 
has proved that this result is equivalent to Brouwer’s fixed point theorem [4]. 
A direct proof from the latter theorem is as follows. 

We suppose that g(@) ~ 0 for any @ such that | 6) < 1. Then the function 
gi , defined on the unit sphere in ®* by 


g(0) 
g(8) ||’ 


is a continuous function mapping this unit sphere into itself. Hence by the fixed 
point theorem there is a point 6* in the unit sphere such that 6* = g,(@*). Also 


gi(9) = 


since g;(@) | = 1 for every @ in the unit sphere, it follows that || 6* | = 1, and 
6*’g,(0*) = 6*’6* = 1 > 0. But this contradicts the fact that 6’g(@) < 0 
(and consequently that 6’g,(@) < 0) for every @ such that | 6° = 1. 


Hence there is a point 6 in the unit sphere such that g(@) = 0. It is obvious 
that |! 6 |! ¥ 1. Hence || 6!! < 1. 
We are now in a position to prove the following existence theorem. 


THEOREM 1. Subject to the conditions F and &, if 6 is a sufficiently small given 
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positive number, « is a given positive number less than 1 and if x ¢ X,,, then the 
equations (2.1) and (2.2) have a solution 6(x), d(x) such that 6(x) ¢ Us. 

Proor. We suppose 6 < a and x ¢ X,,. We consider (3.6) and define a func- 
tion g on the unit sphere in ®* by 


' ( *) = —B,,(0 — 6) + 8 v(x, 6). 


By Lemma 1, v(2, -) is a continuous function on U;. Hence g is a continuous 
function on the unit sphere in &*. Also 


: (@ — @)’g (? 7 _ oa. (®@ — @)’Bs,(@ — @) + 5(6 — 6o)’v(x, 8) 


| . 
~3* 06 — % ~ + 5x3 || 0 — Oo ’ 


if 6¢ U;, since By, is positive definite with minimum latent root yo and, by 
Lemma 1], |! v(x, 6) | < «x3 when 6 ¢ U; . Hence for every @ such that || 6 — 4% 


6, we have 
(0 — 0)’g (? ¥ ") (5x3 — po) 


<a # @<®. 
K3 
Hence if 6 < yo/x3 , it follows by Lemma 2 that there exists a point 6(x) such that 
6(x) ¢ Us and g((6(x) — %) / 5) = 0, i.e., 6(x) is a solution of (3.6). The result 
follows by application of Lemma 1. 

4. Existence of a maximum of L(x, @). In this paragraph we will show that 
for sufficiently small 6, if x ¢ X, , any solution of (3.6) in U; maximises L(z, 6) 
subject to the condition h(@) = 0. 

We suppose that x ¢ X,,, that 6 is small enough for Theorem 1 to apply and 
that 6(x), written 6 for typographical brevity, is a solution in U; of (3.6). We 
let @ be a point in a neighbourhood of 6 contained in U;, such that h(@) = 0. 
(Such a neighbourhood exists since 6 is an interior point of U;.) Then by expand- 
ing L(x, @) about 6 we have 
(4.1) L(x, 0) — L(x, 6) = V(x, 6)( © — 6) + 4(0 — 6)’M, (6 — 6) 
where Mz = (0°L(z, 6*) / 00;00;) and 6* ¢ U;. 

We now consider separately the two terms in the right hand side of (4.1). 
By (3.12) 

I(x, 6)(@ — 6) = I(x, 6)QGHG(0 — 6). 
Now 


0 = h(6) — h(6) = Hg(o — 4) + r(0), 


° (2 ° a 
where, because of 52, by the same argument as was applied to v‘’(@) in (3.2), 
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(4.2) r(0)|| < s*x || 0 — 6 
Hence 


I(x, 6)(0 — 6) = —[Qél(z, 6)]’r(6). 


I(x, 6) = —Bo,(6 — @) + v® 


n 


(x, 6), 
and so 

l , —— ‘ . 

| f(x, 0) |) < wb + «55, since Oe Us, 

nm 
where x, is « positive number depending only on the elements of Bs, , and, as 
above, kx» = 1 + s° + s°x,. Also the elements of Qs are bounded by a number 
independent of 4, since 6¢«U,. Hence 


(4.4) : | Qo l(x, 6) < x65 + k78, 
n 


where xe , «7 ure positive numbers independent of 6. From (4.2), (4.3) and (4.4) 
it follows that 


(4.5) ! \1'(x, 6)(@ — 6)| < (xed + k7 8°) 8"Ke 6 — oll’. 
n 


We now consider the second term of (4.1). By expanding the elements of 

M., »« about @ we find that 
l l * 

n M. 6 - n M..¢, + Myz,6+, 


where, as is easily shown using 94, the moduli of the elements of the matrix 
* ; i 
m, + are less than 2s8x,6. Also by 93, 


] 
|, Mz. - — Bs, + 6m,.6 , 
and so 


] 
: M..»- = —Be, + 6m, 
a 


say, where m is a matrix whose elements are bounded by a number independent 
of 6. Hence 


(4.6) = (@ — 6)’M..» (6 — 6) = ~s (6 — 6)’Bs, (@ — 6) 
n = 


. . 1 rT 2, il 
+ - 416 — 6)’m(6 — 0) < — > Ho 06— Ol) + xsd \0— 6 Ts 
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since B,, is positive definite with minimum latent root yo , and the elements of 
m are bounded. Here xs is a positive number depending only on the elements of 
m. Using (4.5) and (4.6) in (4.1) we find that there exist positive numbers 
ko , ko , independent of 6, such that 


. (L(x, 6) = L(a, 6)] << (-} Mo + KO + cd’) || g— 4 
n “ 


It follows that if 6 is sufficiently small then L(x, 6) < L(a, 6), ie., L(x, 6) is a 
maximum value of L(x, @) subject to h(@) = 0. 

We have thus established the fact that, if the conditions $ and 5 are satis- 
fied, there exists a consistent maximum likelihood estimator 6 of % satisfying 
the condition h(6) = 0. 

5. Asymptotic distributions. We return now to consideration of (3.1) and (3.2). 
We suppose that x ¢ X, and that 6(x), \(x) is a solution of these equations with 
4(x) e Us, 6 being small enough for such a solution to exist. Then, considering 
the equations from a slightly different viewpoint we have, 


(5.1) I(x, 60.) — [Be, + b(x)][6(2) — Oo] + [He, + h(z)] X(x) = 0, 
n l 


(5.2) (Hs, + h*(x)}[6(x) — 6] = 0, 


where b(x), h(x) and h*(x) are matrices whose elements tend to 0 as 6 (and 
hence || 6(x) — @ ||) + 0. We now prove the following lemma. 


Lemma 3. The partitioned matrix 
| Bp —H, | 
/ 
— H's, 0 
Proor. For brevity we omit the suffix 6). Then we wish to find a matrix 


P Q 
Q’ R 
such that, in the usual notation, 


we ie & ~ h 1 


(5.3) BP — HQ’ 

(5.4) BQ — HR = 
(5.5) H’P = 0, 
(5.6) —H’Q = I1,. 


1s non-singular. 


and this requires 


These equations are easily solved since B is positive definite and H is of rank 
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r so that H’B''H is non-singular. We obtain 
R = —(H’B'H)", 
Q = —BH(H’B'H)", 
P = B ‘I, — H(H’B''H) 'H’B"). 

We note at this stage, though we do not require this result immediately, that 
the matrix P has rank s — r. For, from (5.5) since rank (H’) = r, rank (P) S 
s — r. While from (5.3) we have s = rank (P — HQ’) S rank (P) + rank 
(HQ’) <= rank (P) + 7, and so rank (P) 2 s — r. 

We return now to equations (5.1) and (5.2). If 6 is sufficiently small then the 
matrix 


| Be, + b(z) — (He, + ae 
—_ [He, +- h*(zx)]| 0 


also will be non-singular and we will write 
By, + b(x) —(H», + h(x)| +" _ P(x) 0.(z) 
aa + A*(2)] 0 7 ho Rc) | 
Hence, from (5.1) and (5.2), for sufficiently small 6, if z eX, , we have 


1 [Pe 4 I(x, A) 
- n 


= 2.(x) O.(r) R(x) ‘ 


(5.7) 


If the functions 6 and \ were defined for the whole of ®" we could now dis- 
cuss immediately the asymptotic distribution of these functions. However this 
is not necessarily so, and we go through the formality of extending the defini- 
tion of these functions to the whole of ®”. We will then show that the random 
variables thus defined are asymptotically normally distributed and, in this 
sense, We may say that a consistent maximum likelihood estimator 6 of @ is 
asymptotically normally distributed. 

We let (6,), (€m) be decreasing sequences of positive numbers, such that 
€. < 1, 6; < po/ks (see Theorem 1), and 6,, — 0 and ¢,, ~ 0 as m > «©. We 
then define an increasing sequence (n,,) of integers such that, if nm 2 n,, , there 
exists a set in ®" with the properties 1 to 4 for « = «, and 6 = 6,,. For 
m= 1,2,---, if mm Sn < Mm, We choose a set X, with the properties 1 
to (4 for « = em and 6 = 6,,. Hence Pr {X,} —~ lasn — ~ and if n, Sn 

© Mma, and x ¢ X, , the likelihood equations (2.1) and (2.2) have a solution 
6,(x), (a) such that | 6,(7) — 6)! < 6,. Moreover for sufficiently large m, 
6,(x) is a maximum likelihood estimate of % , by §4. We now extend the defi- 
nition of 6, and X, to ®" by letting 


= 0 ] 

P Q = I(x, Ao) 

. = n . 2 ery. 
Xn (x) Q’ R 

i 


6, (x) 
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We have thus defined sequences (6,), (An), 2 = Nm, Nmor, +++ Of random vari- 
ables which have the property that @, converges in probability to % and with 
probability tending to 1 as n — ~, 6,, d, satisfy the likelihood equations (2.1) 
and (2.2). 


ns oe a ; re se . 
THEOREM 2. The random variables n’“(0, — 9), n “An are asymptotically 
jointly normally distributed with variance-covariance matrix 


P 0O 
0 —-R’ 
Proor. If «2 X, , we define P(x) = P, Q,(x) Q, Q.(x) Q’ and R(x) 


R. Then for sufficiently large n, by (5.7) we may write 


a/n(6, — 0) 


The elements of the matrix 


P Q:] 
@. a] 


are random variables which converge in probability to the corresponding ele- 
ments of the matrix 


[ P ci. 


.Q’ R 
since in (5.1) and (5.2) b, h and h* tend to 0 as 6 — 0. Also the s-dimensional 
random variable n 24 -, 0) is asymptotically normally distributed with zero 
mean and variance-covariance matrix Bs, (Cramér [1]), and the (s + r)-dimen- 
sional random variable (n~"*¢(-, @), 0) is asymptotically normally distributed 
with zero mean and variance-covariance matrix 


[sol 


It follows by an extension, to a multi-dimensional random variable, of a theo- 
rem of Cramér [2], that VWn(@, — 00), n “A, are jointly asymptotically nor- 
mally distributed with zero mean and variance-covariance matrix. 


i. 4a 4 +i gd 
Q’ RIL O O}1Q’ R| LQB,P Q'B,Q]° 


(We omit details of the proof of this extension though this result, in contrast 
to Cramér’s result for real-valued random wig rey is best obtained by con- 
sidering characteristic functions). Now from (5.3), PBs,P — PH,,Q’ P. 
Since P is symmetric, PH,, = P’H»), = 0 by ip . Hence PB,,P P. Simi- 
larly PBs,Q = 0 and Q’B,,Q = —R. 
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This completes the proof of the Theorem. We note, however, that, as might 
be expected, the asymptotic normal distribution of the s-dimensional random 
° 1/274 ° ° . . ‘ e 
variable n“(@, — 6) is improper, being by the note in Lemma 3 of rank s — r. 


6. Numerical solution of likelihood equations. In this section we will discuss 
an iterative procedure for solving (2.1) and (2.2) numerically, which yields an 
estimate of the matrices P and R. 

In any practical situation we do not know 4, and the only way in which we 
can verify that the conditions F and % are satisfied is to find that, for every @ 
belonging to some set U, in which we know 6 lies, the following conditions ’, 
5’ are satisfied. 


§'1, ¥'2. For every 6 ¢ U, $1 and $2 are satisfied. 


$'3 For every 6¢ U and i,j,k = 1,2 


> =—9 


8° log f(t, 8) | 


< F;( 
00; 00; 90, A) 


[, Fscoste 6) dt < «1, 


a finite number. 
$4. For every 6 ¢ U, 


=~ r f( 0 z fl 
[ log f(t, @) 8 log f(t, 6) f(t, @) dt, 


Le 00, 30, 


t,j 1, 2,---, 8, are finite, the matrix Bs = (b;;(@)) is positive definite and, 
wears a “ a ’ 2 
if we is the minimum latent root of Bs , then we 2 wo where po is a given number 
greater than 0. 

5’'1, #2. For every 6 ¢ U, 51 and 3X2 are satisfied. 

x’3 For every 6 ¢ U, He is of rank r. 

The conditions ¥ are a straightforward generalization of Cramér’s condi- 
tions [2]. 

We will now assume that the conditions %’ and 3’ are satisfied, that z is such 
that the likelihood equations (2.1) and (2.2) have a solution 6@(x), A(x) and that 


é” is an initial approximation to 6(x) such that || 6°” — 6(x)!| is small. Then 
to a first order of approximation 


I(x, 6) = U(x, 0°”) + Myeco(6 — 6), 
h(6) = h(o”) + How(6 — 0°”). 


Also if n is large, (1/n)(x) is near 0 for ‘“‘most’’ x. We assume that z is a point 
for which (1/n)a(x) is near 0. Then we also have to a first order of approxima- 
tion 
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l s 
Hg -a 


it 
Since 6(x), \(x) satisfy (2.1) and (2.2) then, approximately, we have 
| 6 — 6" 
patie M.. 61) — Hai) 
(6.1) n 
—Hy 0 


The normal situation, if n is large, is that 6(x) is near 6 . Consequently since 
6” is near 6(x) the matrix —(1/n)M.,«) approximates —(1/n)M,4, which in 
turn approximates By, . Then Be) approximates By, and we propose to replace 
—(1/n)M,,0) in (6.1) by Bao, and to obtaina correction to 6", and an initial 
approximation to (1/n)d, by solving the equation 


A ( 
[Buss wiles 6-06 
| Hin, «= 


(6.2) 


The idea of replacing —(1/n)Mz,00.) by Bec.) is not original though the authors 
do not know where it originated. 
Because of 3'4, 30’3, by Lemma 3, the matrix 


Bow) — Ha) 
— Hy 0 


is non-singular and we will denote its inverse by 


& 
Q: RI’ 


We define 6”, \ by 


Bs ' 
+ |? Qi] Mee) 
OF LQ Ro] pe) 


and, more generally, 6°’, \”’ by (with the obvious definition of P,1, Q,-1 


and R,_1), 


| P,_, Q.-1 I(x, 6 7 
n 


=a” o |" lear 
ls . r—l —1 h(e" 1 ) 

If the sequences (6), (A°’) converge then they converge to a solution of the 
likelihood equations, as is easily verified. We do not attempt to give rigorous 
conditions under which these sequences do converge. However the fact that 
we may expect them to converge in most practical situations follows from the 
heuristic argument leading to (6.2). 
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We have thus established an iterative procedure for solving the likelihood 
equations. The heaviest part of the computation involved in this method is the 
inversion of a matrix and computation will normally be reduced by considering 
the sequences (6°), (\‘°) defined by 


a(r) A(r—1) 
6 6 I a(r—I) 1. (r—1) 
| P, Q; - I(x, 6 ”) + Hgir-1 - x 
=/1 + ; n n 
= (r) = (r—1) 
- 9 a R (7 
E n* Q: ' h(6""”) 
r 1, 2,---, where 6” = 6” andi” = X”. Again if these sequences con- 


verge, they converge to a solution of the likelihood equations since 


Lorn 
Q: Ri 
is non-singular. And again we do not attempt to give conditions under which 
they do converge. The main justifications we put forward for this computa- 
tional procedure are 

(i) the similarity between this method and Newton’s method, and 

(ii) the fact that similar modifications of Newton’s method have been used 
successfully elsewhere, for example in probit analysis [3]. The main advantage 
of this method of solving the likelihood equations is that it involves inversion 
of only one matrix. 


7. Tests of the model. In a situation such as is outlined in §1 two natural 
questions arise in practice regarding the adequacy of the model introduced to 
describe an experimental situation. 

(i) Does the true parameter point 4% satisfy the condition h(@) = 0? 

(ii) Is the true parameter point some hypothetical point 6* such that 


h(@*) = 0? 


And this is the natural order for these questions since the second would be 
asked only if the first were answered in the affirmative. We now propose a pro- 
cedure for answering these questions in this order. 

(i) The most natural approach to the first question would be as follows. We 
would calculate an unrestrained maximum likelihood estimate 6,(7) of @ , and 
for 6,(x) we would have ¢(x, 6,(x)) = 0. If h(6.(x)) were in some sense “near 
enough’”’ 0 « @ then we would decide that in fact h(@)) = 0. Dually, we might 
calculate a maximum likelihood estimate 6(x) subject to the restraint 


h(6(x)) = 0 


and then decide that h(@) = 0 if ¢(z, 6(r)) were ‘“‘near enough” 0 ¢ @’. And the 
test we propose is based on the second possibility. We note that, by (2.1), 


Héi(z) = —1(z, 6(x)) 


and it seems reasonable to decide that h(@)) = 0 if A(x) is in some sense ‘near 
enough’ 0 ¢ @’. 
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, . rp P 1/24 ¥ 
We have seen in Theorem 2 that when h(@)) = 0, n “A is normally dis 
tributed asymptotically with variance-covariance matrix —R, which is of rank r. 


sg Consequently —(1/n)d’R 'X, is asymptotically distributed as x’ with r degrees 
of freedom, when h(@) = 0, and, in obvious notation, —(1/n)d’R7'X also is 
approximately, for large n, distributed as x° with r degrees of freedom. We 
propose to choose as a region of acceptance of the hypothesis that A(@) 0 


the set of x for which 


hee : 
— — 4’'(x) Rit) 2(x) Sk, 
n 


where k is determined by 


Pr {xin Sk} = 0.95. 


i = 


This gives a test of size 95% of the hypothesis that h(@.) = 0. 

(ii) The natural corollary of using the asymptotic distribution of \ in this 
way is to use the asymptotic distribution of @ as established in Theorem 2 to 
answer the second question. If 6* % then n(@ — 6*)'Bes(6 — 06*) is approxi- 
mately distributed as x° with s — r degrees of freedom if n is large. This is 
easily established by noting that a consequence of equations (5.3)—(5.6) is 


that B' = PBP — QR’'Q’, and hence that 


! BL = n(6 — 6)'B(6 — @) — | RR. 

n n 
We use this fact as in the previous paragraph to establish a region of acceptance 
of the hypothesis that the true parameter point is 6*. 

Here no attempt is made to justify this test on other than an intuitive basis. 
Since the Lagrangian multiplier test seems to be of wide applicability and ot 
considerable importance in practical statistics, it will be fully discussed both 
from the theoretical and practical points of view in subsequent papers. 

REFERENCES 
{1} H. Cramér, ‘‘Random variables and probability distributions,’’ Cambridge University 
Press, 1937. 
[2] H. Cramér, ‘‘Mathematical methods of statisties,’’ Princeton University Press, 1949 
{3} D. J. Finney, ‘“Probit analysis,’’ Cambridge University Press, 1947. 
[4] S. Lerscuerz, ‘Introduction to topology,’’ Princeton University Press, 1949. 





CONFIDENCE BOUNDS ON VECTOR ANALOGUES OF THE “RATIO 
OF MEANS” AND THE “RATIO OF VARIANCES” FOR TWO 
CORRELATED NORMAL VARIATES AND SOME ASSOCIATED 

TESTS 


S. N. Roy anp R. F. Porrnorr 
University of North Carolina 


1. Summary and Introduction. In this paper confidence bounds are obtained (1) 
on the ratio of variances of a (possibly) correlated bivariate normal population, 
and then, by generalization, (ii) on a set of parametric functions of a (possibly) 
correlated p + p variate normal population, which plays the same role for a 
2p-variate population as the ratio of variances does for the bivariate case, (iil 
on the ratio of means of the population indicated in (i), and, by generalization, 
(iv) on a set of parametric functions of the population indicated in (ii), which 
plays the same role for this problem as the ratio of means does for the bivariate 
case. For (i) and (iii) the confidence coefficient is any preassigned 1 — @ and the 
distribution involved is the central t-distribution, while for (ii) and (iv), the 
confidence statement in each case is a simultaneous one with a joint confidence 
coefficient greater than or equal to a preassigned 1 — a. For (ii) the distribution 
involved is that of the central largest canonical correlation coefficient (squared), 
and for (iv) the distribution involved is that of the central Hotelling’s T*. As 
far as the authors are aware the results on (ii) and (iv) are new and so perhaps 
that on (i). But the result on (iii) has been in the field for a long time in various 
superficially different forms. An important point to keep in mind on these 
problems is that, for such confidence bounds and the associated tests of hy- 
potheses to be physically meaningful, the two variates for the bivariate distribu- 
tion should be comparable. For example, they might refer to the same char- 
acteristic of a set of individuals before and after a feed. Likewise, for a (p + 
p)-variate distribution, the p variates of the first set should be comparable to p 
variates of the second set. For example, they might refer to several characteristics 
of a set of individuals before and after a treatment. In each case the confidence 
bounds are obtained by inverting the test of a certain hypothesis, which is 
indicated at its proper place. Thus, for the (p + p)-variate problem, we assume 
that there are p pairs of comparable variates and it is the pairwise comparison 
for these p pairs that seems, in this situation, to be physically more meaningful 
than anything else. Any general bounds that will be obtained in this paper are to 
be regarded, in a large measure, as a means to this end, although there could 
conceivably be physical questions, some of which will be illustrated in a later 
applied paper to be published elsewhere, to which these more general bounds 
would be pertinent. 
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2. Confidence bounds for the case (i). Suppose we have a random sample of 
size n(> 2) from a population: 


+ 2 
ay m £1 0; poi G2 
: t 2 
Xe $2 po; G2 02 


Let us denote the sample means by @; , 2 , and the sample dispersion matrix by 


2 
8) 8; S27 
: . 
8; Sof 82 


Then for any constant r, it, is easy to check that covariance (x; — Ax2 , LX}. + Axe) 
is var(2,) — d” var(z2) = a1 «®t O32. 

This will be zero if \” = o;/o02. Thus, with a \° = oi/o2, the variates 
t; — Axe and 2; + Az, will be uncorrelated and hence, denoting by r* the sample 
correlation coefficient between these two variates, we have that r* has the 
(central) r-distribution, i.e., ~/n — 2r* / (1 — r*)! has the (central) ¢-distribu- 
tion with d.f. (n — 2). But it is easy to check that 


ae ; or —A 82) ; rs. 
[(si + A°s3 + am 82 ate + ds} — 2ds1 & r)]! 
(sj — X?s3) 


~ [st + atest + 2075? 881 — 27°)]?” 


(2.1) 


Now, starting from the statement (with a probability 1 — a) 
(2.2) Vn — 2\r*/(1 — r? | taj(n — 2), or S tae (more simply), 


where ta/2(n — 2) is the upper a/2-point of the (central) ¢-distribution with d.f. 
(n — 2), and remembering that \ = o;/e2 and substituting from (2.1) for r* in 
terms of s; , 8 and r, we have, for o;/03, the following confidence equation (2.3) 
and confidence bounds (2.4) (with a confidence coefficient 1 — a) 


“ 


(2.3) i - E + : 5 tal — as 7 4° 
M7 3 


Ps o 9 » = q ; 
“1 + ——_£,(1 — ry-A(i + te j2(1 eo - i |s < a 
So = 2 \ rn=- 2 a; 
$i 2 2 2 f 
<- 1+ 5 baa = Fs +4 1 
$3 >” ses \ 


») - 1\ \\ | 

We notice that \ = o;/o2 = 1 if and only if o, = o. 
Notice that (2.2) or (2.3) can be used as an acceptance region for the hy- 
pothesis o;/o2 = X (any specific value) against the alternative o:/02 ¥ X. 
Since the paper was written it has been brought to the notice of the authors that 


0, 
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this region, for the case of o;/o2 = 1, i.e., for o; = o2 , has been explicitly given 
by Walker and Lev [5]. 


3. Confidence bounds for the case (ii). Suppose we have 


X; Py Fay Pp Lu z= p | 
x (2p X 1) = :N 7 


X2 | p & |p i 2 | p 
1 1 


= N[&(2p X 1), Z(2p XK 2p)) (say), 


and a random sample of size n(> 2p) from this population, with a sample 
dispersion matrix denoted by 


Su Sis P ; 
(3.1) ; = S(2p X 2p) (say). 
Sie Se] p 


p Pp 
It is well known [3] that we can choose (non-singular) matrices u(p X p) and 
v(p X p) such that 


(3.2) Zu = wy’, 2 = vy’ and ye = wD, v’, 

’ : : : sles ale’ 
where y’s, 1.€., y1, Y2, °** ,» Yp are the characteristic roots of Di Vie. Vi. and 
D,:/2 isa diagonal matrix whose diagonal elements are y}, --- , y', . It isalso well 
known [3] that these roots are all non-negative, that the number of positive 


roots is the same as the rank of 2,2 and that all the roots are zero if, and only if, 
Z2 = 0 
“12 : 


Now introduce a new variate x* (2p X 1) defined by 


* 
9g. *i9 om P ot av — ») ») ») » 
(3.3) x*(2p X 1) = (say) = A(2p X 2p)x(2p X 1), 


where 


(3.4) A(2p X 2p) = 


Then this x* is N(&*, =*), where &* = AE and 


(3.5) “| (say) 


whence we have that 


a@ o/s 
- | 


“11 u — wD,ri2p’), De = | 


€ 7 _* . le’ . rl + lw 
> s = Ss Ns s 
(3.6) “12 = 11 —~ MY 12 + “i” 6b — pV 


= Lu — wD, row! + wDyrj’ — Ty = 
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This means that the transformed p-set x? is uncorrelated with transformed p-set 
x?. We shall put simultaneous confidence bounds on the largest and smallest 
characteristic roots of AX, i.e., of wy’ “p’ and then show at the end of this 
section how these roots are, in a sense, a generalization of o{/o2 for case (i). 
We may note here, incidentally, that for p = 1, A does, in fact, reduce to o/c» . 
Next, denoting by S* the sample dispersion matrix of x*, we have 


* vk 

7 Si Si2 | p Ss 
S*(2p X 2p) = (say) = ASA 
Se Seo} p 


~J 
S 
™~ 
S 


whence we have 
’ ’ y/ ’ Y 
Sir = Su — ASie — Spd’ + ASnd’, 


a 6 / Batt a 
(3.8) Si2 = Su — ASie + SpA" — ASo9) 


’ ’ } y/ ' ’ / 
Soo = Su + ASi2 + Swed’ + ASood’. 


Now we go back to (3.6). Note that, since = = 0, the transformed x/-set is 
uncorrelated with the transformed x?-set, and also that, in this case, the joint 
distribution of the canonical correlation coefficients and also, in particular, of 
the largest canonical correlation coefficient is known. Thus we can find 
aca(p, p,n — 1) = ca (say) such that 


w* 


‘ LOL Qt—1 *! 
(3.9) Plemex(é ll S12 Soo Sis ) Ca\| 12 >= 0} = 1 — Qa. 


IIA 


The set over which the probability statement (3.9) is made, namely, 


eR-L OK Ck—1 Go! 
(Sir Siz See Sis) S Ca, 


Cmax 


can be used as an acceptance region for the hypothesis that uw’ has a particular 
(matrix) value, and, in particular, that uw’ = J(p), or in other words, Sy, = D2 
The problem now is to start from (3.9), use (3.8) and try to obtain confidence 
bounds on functions connected with \(=yv"'). For this we proceed as follows. 
Let c be a characteristic root of the matrix in (3.9). Then 


y*—1 


(3.10) ff. ee 0. 
With c = 1 — 4d, this reduces to 
(3.11) dSti — 318i, + 4Si2 Ste Siz | = 0 
Now, using (3.8), we have 
1 Sh oe Su — 2 (Sts = Sts 7 S2o) 
» 19 
— = — Sn + 1( Sh + Si) See (Sia 7 Sts) 1 Si9 Sts ‘Sis 
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Hence 


* * yet * 
A Soo ad ‘ ‘ 
(3.13) dS*, — Sut (* . 5) st (= + am 


|dSti — Su + (Su + Sisd’)S2s (Su + ASis) | = 0. 
Next, we recall that for a non-singular M,(q X q) we have 


M, M2 p . ' 
(3.14) = |M,||M, —M.MUM; 
M; M, q 
Pp q 
and, using this, we observe that (3.13) is equivalent to 
Su — dS}, Su + Sy ’ 


(3.15) 
Su + ASi2 + Syd’ + ASo2 dV’ 


? / 
Su + ASie 


that is, 
Su — dShi Syd’ + dSt; 


’ i / ’ , ' ’ 
Sir + ASi2 Syed! + ASooNd’ 


Sn — dSfi Sid’ + dSf | 


y/ ' * ’ , * 
ASi2 + dSi1 AS dX’ — dS, 


Si x Pp Si 
—d 
Soo y P 


But we have 


[ Sn 


(3.16) | as’, 


and 


Hence (3.15) reduces to 
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- % 17 
|! ‘|s|, veal fe ~)j s| |" —I|| =0, 
0 al Lo w Lf —)’ 


which is equivalent to 


I 0 1 I fl Oo 
(3.17) eS — (l -v\ 8 Vv —-l : = 0, 
LO A IL-T —)’ 0 





where e = 1/d, which again reduces to 
(3.18) el(2p X 2p) — S* BSp’ | = 0, 
where 
I ofl 1 —r 
(3.19) 3(2p X 2p) = {J -Aj = ; : 
0 rX'}-I —r I 


Now we go back to (3.9), recall that e = 1/d=4/(1 —c), put e.= 
4/(1 — e,), observe that “emax S Ca” is equivalent to “@max S @a,” and hence 
that (3.9) is equivalent to 


Plemax{S”'BS8’] S ea | Din = O] =1—a 
or to 
( 'B 2 "Ss 'S -] : 
Pr Fa SE la ae : idl =f 0 for all non null 
(3.20) 


a(2p X 1) and b(2p X » | =l-a. 


Next, consider, for all non null a and b, the statement 


(a’sb)’ _ a’Sa_ b’S''b 


(3.2 Sasa * 
wens (a’a)(b’b) = “a “Wa b’b 
Now specialize a’(2p X 1) and b’(2p X 1) into la; 0] 1 and [b; 0] 1, and 
p p Pp Pp 
also into [0 as} 1 and [0 by] 1. 
p p p p 
We next set 
a Fo p 
(3.22) S"“(2p K 2p) = jo) eee |e 
p 
Pp p 
whence we have 
S" = (Sn — Sp Soo Sie) * S” = (Sx — Sie Si Si) 


(3.23) 


v12 ' ’ y—1 mle y22 
= = = i" Ba Soo = — S11 Sy2 ae 
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Back in (3.21) we now observe that (3.21) implies 
“(ai nby* 5 al Sum bis, 
(a; a:)(b; bs) at aS by 


for all non null a; and be, and that (3.21) also implies 


(3.24) 


a 


(a \~'b,)° a, Ses & b, S"b, 
; —— €a ——F as : 
(@» @) (b; b;) @ a b, b; 


(3.25) 


liA 


, 


for all non null a, and b, . If now we consider the left side of (3.24), then it follows 
from Cauchy’s inequality that for all non null be, (ai\be)*/(aja;) (bebe) < 
(a:\X’a;)/(aia;), and it is also well known that for all non null a;, ¢min(AX’) S 
(ajAX‘a)), (aia:) S Cmax(Ad’). We have also exactly similar results by inter- 
changing a; and be, and similar results on the left side of (3.25), in terms of 
\' and a: and b; and then again by the interchange of a, and b; . 

Next, maximizing the left side of (3.24) w.r.t. a: and be, we observe ((2], [3], 
|4]) that (3.24) and hence (3.21) => 


, ’ 22 
Cmax(AA ) = €aCimax( Su CaaS )y 


or, after substitution from (3.23), 


, 


» 92 f ‘ ’ ’ le 
(3.26) Cucina} > Calmax( 11), Cmin( See P=; Sie Su S12). 


Likewise, maximizing the left side of (3.25) w.r.t. a2 and b; , we observe [4] that 
(3.25) and hence (3.21) imply 


— i. —3 ’ aT 
(3.24 } Casali" ) = €aCmax(S22)Cmax(S ). 


Now recall that [3], since all non zero roots of \7'd’' are also roots of \’ ‘A ', 
i.e., of (AX’)’ and X is nonsingular, therefore, Cmin(N A’) = Cmin(AN’) = 
1/€max(AX’) and also similarly that ¢min(A”X’) = 1/emax(Ad’). At this point, 


using (3.23) we observe that (3.27) and hence (3.25) and hence (3.21) imply 
" ee sii a aeaelt. Saaal ; 
(3.28) Cmin(AA ) = Cmin(Su — Sie 22 Si) Cmax (S22). 

Ca 


Also, going back to (3.24) and first maximizing the left side of it w.r.t. be and 
then minimizing the right side w.r.t. a; , we observe [4] that (3.24) and hence 
(3.21) imply 


Q¢ yyy . ! galic 
(3.29) Cmin(AA’) S @aCmin(S31)/Cmin( S22 — Siz Sir Siz), 


and, furthermore, first maximizing the left side w.r.t. a; and then minimizing the 
right side w.r.t. be , we observe [4] that (3.24) and hence (3.21) also imply 


(3.30) “Cmin(AX’) S CaCmax( S31) I Cmax( See _ Sie 4 ow Si). 


Likewise, back in (3.25), first maximizing the left side w.r.t. b; and then mini- 
mizing the right side w.r.t. a2 , we observe [4] that (3.25) and hence (3.21) imply 
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Pe ee , Y glo’ ¥ 4 
Cmax(AX) 2 CminQ@Siu — Sie Seo S12) Cmin(S22), 
Ca 


and first maximizing the left side w.r.t. a2 and then minimizing the right side 
w.r.t. b; , we observe [4] that (3.25) and hence (3.21) also imply 


a: ‘i , ee, zg 
(3.32) Ch ol AK' & Cy x (Say we Si See Bs) Cmax (So). 
Ca 


Now combining (3.26), (3.28), (3.29)-(3.32), we observe that (3.21) implies all 
these statements, and hence, going back to (3.20), we have with a joint prob- 
ability 2 1 — a, the bounds 


Cmin(Sir ‘iil Si. So: ») Se) <= Cmin(AX’) 


é 


a 


. ’ / ’ vf wleo ’ Y / - 
= €a MIN {Emin (S11) Cmin( See = S12 S11 12), Croasil ra), Cmax( S22 aa Ste 922 


: ' 1 glo’ ’ 
max [emin@Si — Si S32 Siz) Cmin(Se2), € 


. 
l max 


(Su — Sie Sa: 


= Cmax(AX’) S €aCmax(S11)/Cmin( S22 —S 2 


It is interesting to use [3] and check that the lower bound of (3.33) is S the 
upper bound of (3.34), but that the upper bound of (3.33) might be = or < the 
lower bound of (3.34). However, it is to be always remembered that ¢min(Ad’) S 
Cmax(AX’), Which should imply an obvious restriction on combined bounds on 
Cmax(Ad’) and Cmin(AX’). 

Truncation. Going back to (3.24) again we can proceed as in [4], equate to zero 
any element of a; and the corresponding elements of bz , a2 , and b, (it has to be 
the corresponding elements, in order to make the process physically meaningful) 
and then apply the process of maximization, minimization, etc., leading ulti- 
mately to the same kind of statements as (3.33) and (3.34) in terms, however, of 
truncated matrices everywhere, with one variate of the first p-set and the cor- 

. , ; ; i Pp 
responding variate of the second p-set being cut out. Thus there will be (?), 


i.e., p pairs of such statements. Likewise equating to zero any two elements of 


a; and the corresponding elements of bz , a2 and b; , we are ultimately led to a 


pairs of statements like (3.33) and (3.34) based on different possible sets of 


(p — 2) variates, and so on. Ultimately we have 1 + (?) + (2) ee ( r J 
~ a 


i.e., 2” — 1 pairs of statements like (and including) (3.33) and (3.34) with a 


joint probability = 1 — a. It should be noticed that on all these statements e. , 
however, stays the same 
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It follows from the above remarks that, with a joint confidence coefficient 
= 1 — a, (3.33) and (3.36) imply, among other things, the following set of 
confidence statements on the ratios o1;/03; : 


J Si 2 ¥ 81; 
(3.34.1) “a-nsts 2 


€¢ for 1 = ar P 
2 > > 2 ’ ’ 
Ca $2: 724 8oi(1 ri) 


where si; , 82: , 7i¢, 02; and r, stand respectively for the sample variances of the 
ith variate for the two sets, the population variances of the 7th variate for the 
two sets and the sample correlation coefficient between the 7th variate for the 
first set and for the second set. 

Interpretation of the role of the characteristic roots of dX’. The characteristic 
roots of AX’, i.e., of wv’ “y’ are all equal to unity if and only if wy’ "y’ is an 
identity matrix, i.e., if and only if 


(3.35) wi=A, ie. Ap, 


where A is any arbitrary orthogonal matrix. Going back to (3.2), we easily check 
that (3.35) implies 


(3.36) 


which, if we recall that A is orthogonal, and 2» and Y» are symmetric, is pre- 
cisely the condition ‘that 2, and 2 are to be similar matrices. Furthermore, 
using (3.2) again it is easy to see that (3.35) also implies 


(3.37) Le = wDy1);2v’ = AvD,in v’ = A X asymmetric matrix, 


where A is the same orthogonal matrix that occurs in (3.36). Thus (3.35) implies 
(3.36) and (3.37) and it is also easy to verify that (3.36) and (3.37) imply (3.35). 
Hence all the characteristic roots of dX’, i.e., of uv’ "y’ being unity is a neces- 
sary and sufficient condition that the relations (3.36) and (3.37) should hold. 
The deviation of these characteristic roots from unity might be regarded as a 
(joint) measure of departure from the hypothesis given by (3.36) and hence 
(3.37), of which a very special case is the one that we get for the bivariate problem. 
Further statistical implications of (3.36) and (3.37) will be discussed in a later 
paper. 


4. Confidence bounds for the case (iii). Starting from the bivariate normal 
distribution characterized in section 2, put q = &/f and introduce a new 
variate z = 2 — qx (assume that & # 0, ie., gq ¥ +). Then z is N(O, o:), 
where ¢: = 01 — 2qpoie2 + q’o2. Thus 


/n 2/ 


v s = Vn(4: — q&2)/(si — 298182" + q’s3)* 


has the (central) ¢-distribution with d.f. (n — 1), so that we can find a ta/2 such 
that 


e Ee _ qB2)” (85 — 298 & 7 + q's) 
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or 
(4.1) P[(# — ks)q? — 2(#F2 — ksyser)qg + (Fi — ksi) S 0] = 1 — a, 


where k = (1/n)tz;.. We can use the statement within the parentheses in (4.1) 
as an acceptance region for the hypothesis that the population ratio of means 
has a specific value q. But such an acceptance is, of course, well known, at least 
in an implicit form. 

Subject to the restriction that q is to have real values, the statement within 
the parentheses in (4.1) gives the confidence bounds on g = &/é. There is also 
the further restriction that (4.1) is supposed to be a probability statement on 
&, , 2, 8, and 8, for all real values of gq = &/& , except for & = 0, ie., forg = 
+e. Equating to zero the expression on the left side of the inequality state- 
ment under the probability sign in (4.1), we have an equation in g whose co- 
efficients involve stochastic variates. The actual confidence bounds on q are 
given by 


(%,% — ks, ser) — [(@:% — ks, s.r)? — (# — ksi) (#2 — ks})]} - 
(@ — ks) — os 
(4.2) oe 
- (4%. — ks, ser) + (FF: 


(#| — ksi) (# — ks?)]’ 
The bounds will be physically meaningful only if the expression under the radi- 
cal is non-negative, i.e., only if, 


Zi Xo A = 


(4.3) 4-4 -trt+k- 


Sj 85 $1 & 8 


Notice that (#/s{) + (#/s3) is always greater than or equal to 2(%;/s1)(Z./se)r 
but may not always be greater than or equal to the right side of (4.3). Thus, 
if in the sample, the inequality (4.3) breaks down we should not, in that situa- 
tion, attempt to put any confidence bounds on &;/é . 

Going back to (4.1) and tying it up with (4.2) and (4.3) we now observe that 
a is the probability of choosing a sample such that either (4.2) is not a real 
interval or (4.2) is real but does not cover the true value. 


5. Confidence bounds for the case (iv). Starting from the (p + p) variate 
normal distribution characterized in section 3, define a set of q’s, q1 , G2, °° 5 p 
by & = D,& where D,(p X p) is a diagonal matrix whose diagonal elements 
are G1, °** , Gp- Introduce a new variate z(p X 1) defined by 
a z(p X 1) = pil — D,] . p = A(p X 2p)x(2p K 1) (say). 
0.1) 


P P lel 


It is easy to check that E(y) = & — D,& = 0, whence z is N(O, 2.) where 
2. = AXA’. Also, given the sample dispersion matrix of x(2p X 1), in the form 


Su Sy» | 


(5.2) S(2p X 2p) =], | 
Sie See, 
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we have samnle dispersion matrix of z(p X 1) given by 

(5.3) S, = ASA’ = Su — D,Sn — SD, + D,SuD, . 

Also the sample mean vector of z(p X 1) is given by 

5.4) 2= %, — Dan. 

Thus, with the q’s defined as above, n2’S ‘2 is distributed as (central) Hotel- 
ling’s 7°, which means that we can find a 7% such that 


n 


. spills Et ts . 
(5.5) ples. < — T. \q’s defined as above | =l-a. 


The set over which the probability statement (5.5) is made, can be used as an 
acceptance region for the hypothesis that the population mean ratios have 
specific values g,’s. This, of course, is implicit in the possible applications of 
Hotelling’s 7°. Now consider the statement within the parentheses in (5.5). 
It is well known that this statement is equivalent to the statement that all 
c|22'S,'| < Ta/n, which again is equivalent to 
altz'a 1, a'S.a 


(5.6) < 


a’‘a ~ on a’a ’ 

for all non null a(p xX 1)’s. Considering the left side of (5.6), we use 
again Cauchy’s inequality to obtain that for all non null a’s, a’Z (a’a)' < +(2’'z)' 
whence we see that under variation of a the largest value of the left side of 
(5.6) = 2’2, that is, = >in (4; — qikai)’, where Z;,; and #2; (for? = 1,2,---, 
p) stand for the ith elements of the vectors X, and xX, . We also note that, aside 
from the constant factor 7%/n, the largest value of the right side of (5.6) under 
variation of a’s iS Cmax (S,), 1.€., Cmax (ASA’), i.€., Cmax (SA’A). Now we use 
{1] to obtain that 


Cmax (SA’A) S Cmax (S)Cmax (A’A), i.€., S Cmax (S)emaz (AA’), 
1.€., SS Cmax (S)emex [1 + D,:), 
1.€., S Cmax (S)max{l + nh ,1+ q? yeoee yl t+ ql. 
Now, if we go back to (5.6) and maximize the left side w.r.t. a, it is easy to 
check that (5.6) implies 
l 


n 


Te Conax(S) max [1 + gi, 1+ @,°°-,1 +45] — 2D (i — 9&2)” 2 0 


il 


n 


T’. Cnax(S) max [1 + gi,°+:,1 +45] — ti — Do gi fs 
i=l 


+2 Zz. Qi Xii te 2 0. 
i=l 
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Also notice that 


| P 


| = | 2 
| Hh tii Z| S z. qi || Bri Bai 


| t=1 i=1 | ii 


(5.9) . 
[max(qi , ---, gpd]! Do | dis das |, 
=i 


Pp 
9 


— D gids S — min(gi,---, 9) 2 fr. 


t=1 


Hence it is easy to check that (5.8) and hence (5.6) imply 


1 TP eax (S) max fl+qi,-:-,1+ 4] 
7 


Pp 
+2>>| a: 2¢ | max [qi, °°: 


=1 
=' = =! = . 2 $\ - 
— XX, — XX. min (qi,--- , gp) 2 0. 


Going back to (5.5) we now observe that with a probability 21 — a, we have 
the confidence statement (5.8) or (5.10). 

Truncation. Here again, as in section 4, it is possible to go back to (5.6), 
proceed in the same way as before and get statements like (5.8) or (5.10) on 
any (p — 1) variate-pairs, or on any (p — 2) variate-pairs, and so on, and 
finally any variate-pair, thus ultimately obtaining 2? — 1 confidence state- 
ments like (5.8) or (5.10), all of them with a joint confidence coefficient >1 — a. 

If we are interested in pairwise comparisons we go back to (5.6), set k = T:/n 
and choose a to be the vector with 1 in the 7th position and 0’s elsewhere. The 
resulting inequality can be written as (4.2) (with k = T’./n). Thus (5.6) im- 
plies a set of inequalities like this for i = 1, 2, --- , p, and hence, with a con- 
fidence coefficient greater than or equal to a preassigned 1 — a, we have the 
set of confidence bounds on £1;/£; given by 


eee a Bint 

(5.11) (e1; 4 €2:) /€3: = 3 £14 / 2: = (e1; + €3;) /€3: ’ 

where, forz = 1, 2,---, 7D, 

i C1; Ly iXoi — KS, 82712: , | ies Toi ioe ks2; ’ 

(5.12) ‘ - - . ~ 
ea = (¥1 Fo; — ksyi8ei7s2i) — (Hii rj ks\i) (#2: = soi). 


As in section 4, the bounds will be physically meaningful only if 


—— a Fii F, 
(5.13) ; — a ——— us, 


$2i S1i $2 


As in section 4 so also here, the remarks made after (4.3) will be pertinent 
again as an indication of how to use these bounds. 
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In conclusion it is a great pleasure to thank the referee and the associate 
editor for their valuable comments and suggestions. The result (5.11), in par- 
ticular, is entirely due to the referee and provides shorter bounds than the ones 
obtained by the authors’ originally, starting from (5.10) rather than directly 
from (5.6). 
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A THREE-SAMPLE KOLMOGOROV-SMIRNOV TEST 
By Herspert T. Davip 
Towa State College 


1. Introduction. In 1951, Gnedenko and Korolyuk published an elegant 
derivation ({6])'’ of the null distribution of the Kolmogorov-Smirnov statistic 
D2, for two samples of equal size n. The statistic D,,, is given by 


(1) Den = sup | Fon(t) — Fin(t) |, 
t 


where F;,,(¢) is the sample cumulative distribution function for the ith sample. 
The distribution derived by Gnedenko and Korolyuk is 


f 1) On\ 7? §Rt) +1 2n 
(9 4 9 ~ = 2 = . : 
2) Pr Das nf e ) 2 ( 1) (, — il 


Since 


(3) lim k. ( - 


(2) easily leads to the familiar asymptotic result 


( oe 
(4) lim Pr ¢ n'* De, = | = > (— 1), 
\ i=l 


n~o 


Gnedenko and Korolyuk’s proof hinges on the fact that, in the null case 
(for two samples drawn from the same continuous distribution), Pr{| D2, 2 l/n} 
equals the probability that the maximum deviation from the origin of a certain 
random walk in the line is at least 1. The random paths involved in this random 
walk start at the origin, and consist of 2n unit steps, n to the left and n to the 
right, with all possible permutations of left and right steps equally likely. The 


9) » 
probability Pr{D.,, 2 l/n} is thus equal to, say, a /(*"), where (°") is the 


total number of equally likely paths, and M is the number of these paths with 
maximum deviation from the origin at least 1. M can be computed by the re- 
flection principle in the line ((2], [1]), leading to (2). 

In this paper I show that the null distribution of the three-sample extension 
D3,, (see (6) below) of D2, can be derived by extending the geometric approach 
of [6] from the line to the plane. 


Received October 21, 1957; revised February 25, 1958. 
1 The review of this paper in Mathematical Reviews [3] was brought to my attention by 
Murray Rosenblatt. 
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D;., is but one of several “distance” criteria that have recently appeared in 
the literature. Fisz and Kiefer [4], [7]? have shown that the criterion 


R, = max {sup | F;,.(t) — Fe.n(é) |, sup | 2Fin() — Felt) — Fant |}, 
t t 


and extensions of R, to k samples and unequal sample sizes, can be used with 
existing Kolmogorov-Smirnov tables because the events 


A: [sup | Fan(t) — Fen(t) | <M] 


and 


B: [sup | 2Fi,.(0) — Fea(t) — Fan(t) | S dol 
t 

are independent. It may be of interest to note that the criterion 2, corresponds 
to using a rectangular boundary on the hexagonal grid of Figure 1, and that 
the independence of the events A and B, and distribution of R, , follow easily 
from this representation. 


Ozols’ [8]? treatment of the criterion 
S, = max {sup (F;,,(t) — Fen(t)), sup (Fon(t) — Fin(t))}, 
t t 
is similar to my treatment of the criterion D;,,. The boundary corresponding 


to S, is an infinite 60° wedge on the hexagonal grid of Figure 1. 
Finally, Kiefer [7] and Gihman [5] consider a criterion 7’, (or Di) of form 


t | 


sup (= (F;,.(0) — F.()'), F,(t) = > F;,n(t)/k, 


t=1 


and extensions of this criterion to unequal sample sizes; Kiefer [7] also considers 
the k-sample extension V, of the statistic (5) given below in section 2. 

Kiefer has shown in [7] that “distance’”’ criteria of the type discussed above 
have good power properties. Among such criteria, one might suspect on heuristic 
grounds that D;,, has especially good power characteristics against the ‘“‘one- 
sided” alternative H4:[((X < Y < Z)or(¥ <Z< X)or(Z < X < Y)j. 
This is because H, tends to generate paths, on the grid of Figure 1, in the direc- 
tions 2/6, 7/6, +27/3, or r/6 + 42/3. 


2. A three-sample Kolmogorov-Smirnov statistic and its small-sample nul! 
distribution. A natural three-sample extension of (1) would be 
Max {sup | F2,(f) — Fi,n(é |, sup | Fs,.(t) — Foal) |, 
t 


(5) ; . 
sup F(t) — F;,,(t) y 
t 


2 T owe these references to an associate editor. 
3 IT owe this reference to Milton Sobel. 
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But (5) does not lend itself easily to an extension of Gnedenko and Korolyuk’s 
geometric method; a statistic that does so lend itself is that obtained from (5) 
by deleting the absolute value signs: 


D;,., = Max {sup (Fen(Q — Fin(), sup (F3n(0 — Fe n(t)) 
t t 


sup (Fin(t) — Fs.n(0)) $. 


t 


The null distribution of D,,, is its distribution when the three samples are drawn 
from the same continuous population. This null distribution is derived as fol- 
lows. 

A step of type A in the plane is defined to be a unit step to the right (direc- 
tion 0); a step of type B is a unit step in the direction 27/3, and a step of type 
C is a unit step in the direction 41/3. 

In the null case considered, ties occur with probability zero; hence (almost) 
every set of three samples of n leads to a ranking of the 3n sample values mak- 
ing up the three samples. Corresponding to each set of three samples, consider 
a path p;,, from the origin, composed of 3n unit steps, with the kth step of 
P3,n of type A if the rank & belongs to the first sample, ete. Clearly every p;., 
contains n steps of each of the three types A, B and C. 

Next, consider the equilateral triangle in the plane that is centered at the 
origin, has sides of length 3/, and is oriented such that one of its sides is hori- 
zontal. Call this equilateral triangle T, . Clearly 


\ 


7 ( L) , 
(7) {Don 2 —? = {(psn 1 T)) is not empty}. 
| 7) 


But in the null case every path p;,, (permutation of 3n steps, n each of type 
A, B and C) is possible, and each of the (3n)!/(n!)* such paths is equally likely. 
Hence (7) implies 


| 
(8) Pr Dis im 


[™ 


> = Pr {(P3,. A T,) is not empty} = N/(3n)!/(n!)", 


~ 


v 


where N is the number of paths p3,, touching or piercing T, . The small-sample 
problem is therefore solved if N can be evaluated. 

N is evaluated by extending to the plane the principle of reflection that 
yielded M. Consider a hexagonal grid in the plane, consisting of ‘“®’’ points 
and ‘‘©”’ points, as indicated in figure 1 for the case (n = 7,1 = 2). The ex- 
tent of the grid is fixed by the fact that the distance between the origin 0 and 
each of the three “vertices” (indicated by the letters V; , V2, Vz in figure 1) is 
(31)({n/l}). This distance is of course (3.2)({7/2]) = 18 for the case illustrated 
by figure 1. The central triangle indicated by the heavy line in figure | repre- 
sents T;. 
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Fra. 1 


Any path from the origin 0 to a @ point, that consists of 3n steps of type 
A, B or C, is called a path of type ®. A path of type © is defined similarly. A 
path of type ® or © is called an auxiliary path +. Again, any path from the 
origin to the origin, that consists of 3n steps of type A, B or C, and that touches 
the boundary [,, is called a boundary path 8. Finally, Ne, Ne and N> are, 
respectively, the total number of paths of type @, the total number of paths of 
type ©, and the total number of boundary paths. 

The argument now is as follows. 


For any particular endpoint, whether it be a © point, a © point, or the 
origin 0, the specification that there be 3n steps in a boundary path or 
auxiliary path from the origin to that endpoint actually determines the 
numbers m4 , mz and mc of steps of types A, B and C involved in the path. 


(9) 
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(9) follows from the fact that the location of the endpoint provides three 
equations in m4 , mg and mc, which, together with 


(10a) Ms + Mg + Mec = Bn, 
yield m4 , mg and m-. These three equations are 
(10b) Mc — Mp = Ky 
(10¢e) Ms — Mc = Ko 
(10d) Mp — m, = K;. 


K, , Kz and K; are determined by the signed perpendicular distances d, , d; and 
d; of the endpoint from the lines LZ, , Le and L; (see figure 1). For example, 
K, = 2d,/3' if the endpoint is d; units below L; , and K,; = —2d;/3' if the end- 
point is d; units above L, . 

In particular, for every path from the origin to the origin, (10b), (10c) and 
(10d) become mc — mg = m4 — Mc = Mg — My, = DO, which, together with 
(10a), yield m4 = mg = mc = n. This last implies that the boundary paths 
are exactly the paths enumerated by N, or 


(11) No=N. 


Next, we introduce the operation of reflection. Reflection is an operation 
performed on an auxiliary path w that yields a path p(w) which can be either 
an auxiliary path or a boundary path. Reflection is defined as follows. 

Let a be an auxiliary path whose last point of contact (proceeding along r 
from the origin) with I’, is the point wu. 

1) Suppose first that wu is not a vertex of [,. Suppose for example that u 
lies within the horizontal side of T, (i.e. the side oriented in the direction of a 
step of type A.) Then p(z) is obtained from 7 by replacing every step of type 
B occurring after u by a step of type C, and every step of type C occurring 
after u by a step of type B. Analogously, if wu lies within the side of I; oriented 
in the direction of a step of type B, then p(7) is obtained from 7 by replacing 
steps of type A occurring after u by steps of type C, and vice-versa; if u lies 
within the side of T, oriented in the direction of a step of type C, the transposi- 
tion of step types involves types A and B. 

For example, reflection of the path 7m, (see figure 1) leads to the path 7; . 

2) If uw is a vertex of T,, then reflection consists, as in 1), of a transposition 
of two step types. Which two step types are involved is determined by the re- 
quirement that the step occurring immediately after u be converted into a 
step lying in T,. Thus, for example, if wu is the vertex of T, lying on both of 
the two non-horizontal sides of T, , and if the step occurring immediately after 
u is a step of type A, then the two step types involved in the transposition are 
types A and C; in other words, p(x) is obtained from 7 by replacing every 
step of type A occurring after u by a step of type C, and every step of type C 
occurring after u by a step of type A. 





- 
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The operations of reflection, performed on an auxiliary path 7, yields a path 
p(r) which 1) contains 3n steps, 2) contains no steps of types other than A, 
B and C; and 3) begins at the origin and ends at a ® point, at a © point, or 
at the origin. (Endpoints exterior to the grid of figure 1, such as the endpoint 
of the path x; for example, cannot result from reflection, because, for such end- 
points, equations (10) have at least one negative solution). Finally, it is clear 
that: 4) the number of steps of z exterior to T; , from the point of last contact 
of x with T, to the endpoint of , is greater by at least one than the number 
of steps of p(2) exterior to T, , from the point of last contact of p(w) with T, to 
the endpoint of p(7). 

By 1), 2) and 3), p(x) is either an auxiliary path or a boundary path, and, 
by 4), successive reflection pi(a), po(pi(a)), pa(pe(pilw))), «++ eventually lead to 
a boundary path, say p:(pei( --- pi(w) «++ ); this boundary path is called the 
image A(2) of 7. 

Our discussion of reflection can be summarized by: 


(12) To every auxiliary path there corresponds a unique image path A(z), 
“’ which is a boundary path obtained from by successive reflections. 
Further, 


13) 2mong all the auxiliary paths with the same image path, the number of 
» 


( 
paths of type © exceeds the number of paths of type © by one. 


(13) follows from the fact that the auxiliary paths with the same image path 


8 come in pairs of type (@, ©), as illustrated in figure 1 by paths wm. and z;, 
except for a single “bachelor” path of type © from the origin to one of the 
three © points immediately next to T,. 

The bachelor path of type @ is the auxiliary path yielding 8 after only one 
reflection; it is uniquely defined for any boundary path 8, and is constructed 
from £8 as follows. Let v be the last point of contact of 8 with T,, proceeding 
along 8 from the origin in accordance with the directions associated with each 
of the three step types. (Note that 8 has at least one point of contact with I, , 
since 8 is a boundary path). The bachelor auxiliary path is constructed from 8 
by “reflecting” the portion of 8 following v. (The word “reflection” is put in 
quotes because, up to now, reflection has been defined only as an operation on 
auxiliary paths. But the construction involved here is entirely analogous to 
the earlier operation.) For example, if v lies in the horizontal side of T, , then 
“reflection” of the portion of 8 following v consists of replacing every step of 
type B by a step of type C, and every step of type C by a step of type B; the 
procedure is analogous if v lies in one of the other two sides of T;. (Note that 
v is never a vertex of T,). 

The pairing of the other auxiliary paths with image 8 is accomplished by 
“reflection” about the last point of contact with the triangular grid lines in- 
dicated by the dashed lines in figure 1. (The word “reflection” again is put in 
quotes, because the usage here does not correspond exactly to the operation 
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yielding p(x) from 7). For example, consider an auxiliary path a2 with image 
8, and let the last point of contact of w: with the triangular grid lines be w; sup- 
pose for example that w lies on a grid line oriented in the direction of a step of 
type B (as illustrated in figure 1). Then, as indicated in figure 1, the mate 7; of 
mz is obtained from 7: by replacing every step of type C occurring after w by 
a step of type A, and every step of type A by a step of type C. The same ‘“‘re- 
flection” operation, applied to 7; , yields 72, which establishes the pairing. 

That a2 and its mate 7; have the same image £ is best verified by imagining 
mr. and zs as undergoing reflection simultaneously. 

Except for the single bachelor path, auxiliary paths with the same image thus 
come in pairs of type (®, ©), except possibly in the case of an auxiliary path, 
such as that indicated by mp in figure 1, whose potential mate 7; is not one of 
the auxiliary paths. However, auxiliary paths such as m do not exist, and this 
is shown as follows. 

Suppose there were an auxiliary path, such as mp, to an endpoint at the outer 
edge of the hexagonal grid of © points and © points, which entered the triangu- 
lar cell containing this endpoint from an “exterior” side of the cell. The four 
equations (10a), (10b), (10c) and (10d) yield me = n — U({n/l)) for any auxil- 
iary path to any endpoint between the two vertices V; and V2 . (Correspondingly 
m, = n — U{n/l]) and m4 = n — U{n/l]) for the other two sets of ‘outer’ 
endpoints). Hence, if ao existed, it would contain n — I({n/l]) steps of type C. 
But then 7, would contain n — I({n/l}) — I steps of type C, which could not be 
because n — U([n/l]) — lis negative. 

Finally, 


(14) Every boundary path is the image of at least one auxiliary path, 


because every boundary path is the image at least of its corresponding 
“bachelor” path. 
(12), (13), and (14) imply 


(15) No = Ne “_ No ° 


(15) is shown as follows. Let r denote an auxiliary path, let 8 denote a bound- 
ary path, and define the function f(7, 8) as follows. 


f(x, B) = 1 if 8 is the image of z, and z is a path of type @. 
f(x, B) 


ll 


—1 if 8 is the image of z, and 7 is a path of type ©. 
J(r, B) = 0 if 8 is not the image of r. 


Now, for any fixed 6, 
> f(r, 8) = 1 


by (15) and (14), so that 
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(16) > [do f(x, 8)| = No 
8 7 


(gain, by (12), it is true for every fixed a that 


> f(x, 8) = +1 for x of type @ 
8 


| for x of type ©, 


so that 
(17) p> [Xe flr, p)] = Ne — Ne, 


and (15) follows from (16) and (17). 
(11) and (15) yield 


(18) N — No —_ Ne 


In view of (8), equation (18) represents the solution of the small-sample 
problem, because the computation of Ng and of Noe is straightforward. For ex- 
ample, Ng is the total number of paths of type ®, which is easily computed 
because the number of paths to any particular ® point is given by the usual 
trinomial coefficient, the count being entirely unrestricted. The three argu- 
ments of this trinomial coefficient are the numbers of steps of types A, B and 
C involved in any auxiliary path to this ® point; these numbers are of course 
fixed by the location of the ® point, in view of equations (10). There remains 
only the problem of efficient enumeration of @ points and @ points; one such 
enumeration gives for Pr {D;,, 2 l/n} the expression 


[n/t] 
(19) $2. > (+)(n!)°/(n — il)(n + jl) (in + G — DD), 

tl jget(t) 
where the set J(7) coasists of the integers (2 — 7,3 — 71,5 — 7,6 — 7,8 :. 
9 — i, 11 — 7, 12 — 7,--- , 27), and where the (+) sign indicates that, for 


fixed 7, successive terms in the finite series indexed by 7 have alternating signs, 
beginning with + for 7 = 2 — 1, — forj = 3 — 1, + forj = 5 — i, ete 


3. Large-sample distribution. The asymptotic distribution of D,,,, is given 
by the following theorem. 
Tueorem. For dn! integral 


lim Pr [n'Ds in “= r} = 3 z Zz (+)¢ ae 


n +0 t=1 jes(a) 


where the set J(7) and the sign (+) are as defined in (19). 
Proor. Put 1 = dn! in (19). Since, for fixed ky , ke, kg with ky + ke + | 0, 


=?) 


(n!)° ahd? 


20) li siointcligg iD iat 
0) ane (n + kin) in + kan) iin + Ken’)! 











850 HERBERT T. DAVID 


it suffices to show that, 


(for k large enough, 


[nl/2/y] 
| R(k, n,A) = Z zx (4)(n)*/(n — aan"): 
“(n + jdn' “Y'(n + (a —. j)dn' “1 


| 
| 
; 
(is arbitrarily small, uniformly in v for large n. 


Rewriting the terms of (21) and putting the absolute value signs inside the 
first summation, 


[nt/2/\] 


R(k,n,vX)S z. ((n!)°/(n — idn' *\(2n + irn"*)!) 


2n + idn'? 
4) 2. (x) er ii 
| e408) n+ jAn~ 


For fixed 7, the absolute values of the terms of the alternating series increase 
monotonically to the maximum 


, —— 
2n + 2An 
. 1/2 
n + [7/2] An 
and then decrease monotonically. Hence 


2n + idrn'” 2n + idn'”* 
| DD (+) ate ® | a 
ied (i n+ jrn © n + |i/2] An ~ 


and (22) yields 
nii2 \ 
(23) R(k,n,d) < |x i 


where 


(24) b; = (n!)°/(m — idn -) n+|3 [aw ‘\(n + ( - By) An’ ‘) ! 


It is easy to show by direct computation that 
1) b;/bis1 is increasing in 7, 


An1/2 
2) bk / dea 2 (1 + E |» : , Which is uniformly close to 


k/2 


22 o 
eI for n large. 





Hence, by (23), R(k, n, \) is essentially bounded by 


42 


(25) RI OO) 
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for n large. But, by (20) and (24), (25) is approximated by 


0) (hk? + [ke /2] 2k [ke /2)) —[k/2}d2 
de + [k/2} { Dd a i [k/2] ) 


for n large; this establishes (21). 
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DISTRIBUTION OF A SERIAL CORRELATION COEFFICIENT 
NEAR THE ENDS OF THE RANGE! 


By M. M. Srppieut 
University of North Carolina 
1. Introduction and summary. If y: , --- , yy are observations on a stationary 


time series at equal intervals of time and it is known that Ky, = 0 fort = 1, 
N, the most natural definition of a serial correlation coefficient with lag 


unity would be 
N—1 v—1 N—1 1/2 
r* = ( ZZ Yi Yis ) I( 7 ii) ( ps Yi a) | 
i=l i=1 i=l 


if the denominator #0. This is the ordinary correlation coefficient between 
(y1, °°: , yw-1) and (y2, +++ , yw), except that instead of taking deviations from 
the sample mean, we have taken deviations from the population means. Due to 
the seemingly unsurmountable mathematical difficulties involved in obtaining 
the distribution of r* even on the hypothesis of independence and normality 
of the observations, several alternative definitions have been proposed as ap- 
proximations to r*. However, it is desirable to consider some relevant proper- 
ties of the distribution of r*. 

In this paper the distribution of r* near the extremities of its range will be 
considered. The observations will be assumed to be distributed as independent 
N(0, 1) variates. There is no loss of generality in assuming the variance to be 
unity as r* is independent of the scale parameter. A geometrical approach sug- 
gested by Hotelling seemed to be particularly suitable in obtaining the order 
of contact of the distribution curve at r* = +1. Hotelling [1] shows how to 
determine the order of contact of frequency curves of some statistics with the 
variate axis at the ends of the range even though the actual distributions are 
unknown. It will be shown here that if for a number 7 in [0, 1] and close to 1, 
P(r* = ro) is expanded in a series of powers of (1 — 7), the first non-zero coeffi- 
cient is that of the power (N — 2)/2. Upper and lower bounds for the coeffi- 
cient of this power will be calculated. The lower bound is positive and the 
upper bound gives an approximation for an upper bound on P?(r* = ro). 


2. Geometrical representation. Let Y,,--- , Nw be N independent N(0O, 1) 
variates. Define 


(24 r= (SO XXwl(O X(T Xin)” 


where ail the summations are from 1 to N — 1 and the denominator ~0, then 
r* is a variate with range [0, 1). 
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for every set of observations y;,--- , yw on these variates we take a point 
S with coordinates (y, +--+, yw) in an N-dimensional Euclidean space, which 
may be regarded as a representation of the sample space. Denoting the origin 
by O, we see that the points S are distributed with spherical symmetry about 
V. Yurthermore, a unique value of r* corresponds to all the points on a straight 


line OS, excepting the origin. Let the straight line OS meet the N — 1-dimen- 
sional unit sphere in Q and Q’, where Q is on the same side of the origin with 
S. Denoting by (2, --- , «y) the coordinates of Q, we have 

N 
(2.2) ZZ vi = |, 

tel 


Which may also be taken as the equation of the unit sphere. The points Q and 
(’ may be considered to determine a unique value of r*. Considering only the 
point @, it is easily seen that the distribution of Q is uniform over the 


unit sphere; that is, denoting the total (N — 1)-dimensional surface area of 
(2.2) by Sy_,, the probability of Q falling in an area A on the sphere is 
A/Sw-4. 


kor a given ro in |[—1, 1] there exists a set of points on the unit sphere such 
that for each point in this set the corresponding value of r* lies in the interval 
Iro , 1], and for no other point. If this set of points covers an area A on the sur- 
face of the sphere (2.2), it follows that 


Pir*® =r) = A Sw-3- 


We observe that r* 1 if and only if a = Am-1,7 = 2,3,°---,N,A>0 


and x, ~ 0, that is, zx; K mi, 4 2,3,°---,N,A > 0 and 2, ¥ 0. Since 
the point (2, +++, zy) lies on (2.2), we obtain for the value of x,, 2, = +e 
where 
(2.3 e= (1 — »*)'47—1 — a” 

Denote the variable point (c, Ac, +++ , *"e) by P and (—c, —Ac,:::, 


—)\*''c) by P’. As X varies from 0 to «, each of P and P’ describes a curve 
for every point of which—-excepting the two points of each curve obtained by 
A’ = Oand «-~-corresponds the value of r* = 1. 

Since both these curves are exactly alike, except for their position in space, 
we confine our attention to the curve 


(2.4) XY Cc, a, = "x, ¢2z=2--- N,O <A < om. 


Further, from now on we reserve (2; , ++: , tw) to denote the point on curve 
2.4) which corresponds to the parameter \, and we use (e, «+: , ev) to denote 
any other point on the unit sphere. 

To find the probability of r* exceeding a given value ry which is close to 1, 
we consider the points within a “tube” of geodesic radius @ on the surface of 
the sphere (2.2) with its axial curve (2.4). 
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Let the length of the curve (2.4) measured from Po(1, 0, --- , 0) to 
P(x is tn) 


be denoted by s, or more explicitly s(A), and an element of curve by ds. Denot- 
ing by primes the differential coefficient with respect to s, the direction cosines 
of the tangent to the curve at P are 


, ’ ’ 
Ss 5 Bey *** 5 Bes 


where 
(2.5) a; = [(@ — 1)a**c + A" de/dn)Nn, 
We note that 


(2.6) 


and since 


. 
7 aja, = 0. 


i= 


Let the coordinate axes be rotated so that the new coordinates are denoted 
by the elements of a vector a. Let a = Be where 


and 
(2.8) BB’ 


Here J denotes the identity matrix, B’ denotes the transpose of B, and ¢ and 
a denote the column vectors (6, --: , ev) and (a1, --- , aw) respectively. 
The a; axis is now parallel to the tangent of the curve at P and the az axis 
coincides with the line OP. 
The (N — 3)-dimensional sphere given by the set of equations 


(2.9) a, = 0, a. = cos 8, a; = 6; sin 8, 
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with 
e 
DX Bi = 1, 
t=—3 
lies entirely on the (V — 1)-dimensional unit sphere 
N N 
(2.10) Da =1= De. 
t= i=l 


The sphere (2.10) is the same as (2.2) with a change of notation. Each point on 
(2.9) is at a geodesic distance 6 from P measured on the sphere (2.10). Further, 
since (2.9) lies in the plane a, = 0, this hypersphere is perpendicular to the 
tangent of curve (2.4) at P. 

Changing back to the original coordinates we have « = B’a or 


Gg = Lia + Ziae + bsias ft eee + bwian ‘ t = .. esis. N. 
Equations (2.9) become 
N 
(2.11) é; = Z; cos 6 + sin 6 >, bi; 8;, j=1,2,---,N 
with 
N 
> Bi = 1. 


3. The value of r* near the curve. Let us calculate the value of r* correspond- 


ing to a point (e&., --- , ev) on the hypersphere (2.11). We have 
N—1 
(3.1) r= (Leen) - da - ar 
pe=l 
since 
N—l 


v—1 
< =1—e, and 2 Gn = 1 — «&. 
1 3~1 


Inserting the values of e’s from (2.11) in terms of x’s, using equations (2.4)- 
(2.8) and neglecting the terms of order sin’ 6, we have, after some algebraic 
simplification 


1—r* (Q—-v)1 +A"), (bk — a) — A” 
= | ene . : 
sin? 6 r 2rv2(1 — d2¥-2) + N2(1 — A242) 
N N 2N-—2\ N 1 N N 
M1 —A 
(3.2 ’ ls DD diabiv BiBe — — oo De 2 bij be.541B: Bi 
k i=% = ¢ I=1 k=3 1=3 


(1 — A**) (DOhs ba8)*> , 2 fX ; 
— _ . +X b; i 
Q(1 — X°) nN . 2X ne 
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As an approximation replace the terms in the square bracket in (3.2) by their 
expectations. Since 6;,°+--, By are Cartesian coordinates of a point on the 
(N — 8)-dimensional unit sphere 


‘. 
Dd bi = 1, 
i=3 

the distribution of (83, --+ , Bv-1) is given by [see for example [3] p 385 


r((N — 3)/2) 3°  dBn_1 
(N—3) /2 / 4 


oo 3 tt — By)! 


From this or from considerations of symmetry we have 
Es; = 0, 
EB6. = 0, ik, 


Rearranging the terms of (3.2) and using the orthogonal property of B, we 
have after simplification 


2 ij 2N 
. zs *)\l/2 , (tb —avajyd + Aa”™) 
sin 6 (1 — r*) E + D1 — )2"-2) 


* WW = Ha 

4. Integral expression for P(r* = ry). To find the probability that r* exceeds 
ro Where ro < 1 and close to 1, we proceed in the following manner. For a given 
\, ro determines a unique positive value of sin 6, hence a unique value of @ in 
the interval [0, 7/2], say (A). From (3.3) it follows that for a given A, @ < @ 
implies r* = ro and vice versa. By a theorem of Hotelling [2, p. 451] the 
(N — 1)-dimensional “area” of a tube of length ds and geodesic radius @ on the 
surface of the (V — 1)-dimensional sphere >-*_, ¢: lis 


(N—2)/2 
T 


. Nog 
—— sin” “@ds. 
r(N/2) 


The probability that a random point (4 ,---, ew) falls in this elemental 
tube is the ratio of the (V — 1)-dimensional area of the tube to the area of the 
unit sphere. This ratio equals 


(27) ‘sin *~* @ ds. 


Remembering that for r* = 1 there correspond two curves on the unit sphere, 
one traced by P and the other by P’ and noting that changing the signs of all 
e’s in (3.1) does not change the value of r*, we obtain 


a@® 


(4.1) P(r* =r) = 2! | sin* *@ ds(Q), 
0 
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where the variable of integration is \. This can be written as 
a 


(4.2) P(r* => ro) = we (1 — ro)?” | (g(a)? RO) Ad, 
where 


— *)(1 + r**) (1 — es as *) 


| (1 
3 oe - 14 4-52 —*. e— S 
| mw) = 1+ Sead — ey + 20 — 


and 


ds de. 1 N*)24-2 | 2 
(44 a i ee we dr; cs wg. 
_ 2 fe (= ) | FE —yy ” 71 — 


We note here that E. 8. Keeping [4] has studied the integral of A(A) over the 
range [0, «]. 

If we change the variable of integration from \ to 1/A we observe that the 
integral in (4.2) remains unchanged, hence the integral from 0 to 1 is the same 
as from 1 to «©. Writing J for the integral in (4.2), we have 


1 
(4.5) J=2 | (gay 7a) ad. 
“0 


By considering the sign of the differential coefficient of g(A) in the interval 
[0, 1] we find that g(A) is a monotonically decreasing function of A, and 


g(0) = ~, g(1) = [N/(N — 1), 
Write 


1 


(4.6) x(A) = gin)? 


then x(A) is a monotonically increasing function of \ in [0, 1) with 
(4.7) x(0)=0 and x(1) = (1 — 1/NY. 


5. Bounds on the integral J. From (4.5) and (4.6) we have 


a1 


(5.1) J=2 [x(a)] Par) a. 
“0 
Now x(A) can be written as 
r 1 _ im Nv (1 oe ie —1 
2.2) (A) = 2h i — = ee . 
— : YC o [r+ Me 
Make the transformation 


x 9\ 
J.) 
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in (5.1), then 
o(N—2)/2 


J = 


N 
(5.4) | [ (cosh 5 — coth x sinh :) (coseeh? i — N° cosech® x 
0 


, N ( 12% aay [oe 
}! +. vu 3(: osh coth x sinh V )| 


Using elementary expansions of hyperbolic functions in power series, for ex- 
ample, 


2 et 
cosh x = 1 + A + i +4... 


for every x and for |x| < z, 


= ¥ 


l 
‘oth x = -_— — 
ee 2+ 3 45 


and after some binomial and exponential expansions, we finally obtain 


2 4 
—a/Ny)(N—2)/2 1 _« c 
errata 


+= 


where this expansion is valid for | «| < =. 
Similarly 


z : | a 135 sa 
h(e*™) d(e**) = - nl! “gt gaa* + +00 ) |e 


We split the range of integration in (5.4) into the ranges [0, 1] and [1, «] 
Denoting the integral from 0 to 1 by J;, we have, omitting the terms O(N‘) 


hese [(i-t4+54.. 
on Ce 6” 40 
(1+ ape to )ae = 319216) = 0196. 


Hence 
(5.5) J = 0.196 +2 | [x(a)] Par) ad. 
0 


Denote by J2 the second term on the right hand side of (5.6) and substitute 
\ = y'” so that 
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I (G—y) 


Ny(i — y*~’) | wills | 
ay ome oe one 
| ° (N — 2)(1 — y¥) 


It can easily be shown that 


1/2 
¢ 


ee a 
(I+ ye? 


N*y*"( — y)* }" ae Ny" —y* Ny" — y) 
(1 — y*")° 2(1 — y*)? 8(1 — y%)4 
We then have 
(5.9) ea <I <Q, 


where 


(5.11) M(p,q,e€*) = [ y (1 + y) * dy 
“0 


and 
2/N 


(5.12) L(p, q,€° *) = | y’(1 + y) (1 — y)" ay, 
0 


where q = (N — 2)/2 and p = sq+b,s > 0. 
Substituting y = e°“z and expanding (1 + ze~”*)~™ in powers of (1 — 2) 
and integrating term by term, we obtain 


2(p4))/N 
—9/N ° 1 
J 9 € — = —. ena ain ” ’ 2 
I(p,q,e) @p+hDatre nya (1,9 p T= san) 


—2( p+) /N 2 


—9/N € 1 
F io = - ° 4; k 2,——x }- 
L(p, 9 ) i am P( Qptk+ an) 
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Ifs>1,b>O0andz>0 
(¢ -+ 1) ° 
ee St cape amen it a 
Fl, q, sq + 0,2 ito 7h" * Gite ttrt so’ + 


q q 2 
+ 2 ss ae me er 
+ Sq + b *+ (sq + b)? -? 


—1 
: _ qe 
(1 sq + :) 


and 


q , 
Sq + b p 


F(l,q.sq+ 6,2) < 1+ 
" qq + 1) — 2 a 
™ (sq + b)(sq + b+ p zl +i+rrt 


Since q = O(N), omitting the terms of O(N') we have 


2s 1 1+s+ 2s" 
st < Fk (, q, 8¢ + b, i-e :) < ae i 


A systematic calculation then shows that 


042 


D(N--2)/2 [1 + O(N ‘)] < L(p, q, € 7 a < rs 2 ll + O(N~")). 


Denoting the integrals of successive terms in (5.10) by Q; , Qe, ete., as they 


occur in order and neglecting the sign, we see that 
Q,=2 
Hence 
0.542 < Q, < 0.629. 

Similar calculations on the following terms show that 

Q < .629 — .065 — .101 + .029 — .005 = .487 
and 

Q > .542 — .066 — .103 + .028 — .006 = .395. 
The terms diminish very rapidly and the later terms do not affect the second 


decimal place. Thus from (5.9) 
239 < Jo < .487, 


and since 
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therefore 
(5.13) 435 < J < .683. 
These calculations are valid to two decimal places and O(N’). Finally, the 
= ro) in powers of (1 — 19) is 
- 2)/2 


first term, /’9, in the expansion of P(r* 
To 


Po = J/x(1 
It is easy to see that the first term in the expansion of P(r* < —ro) is the 


same as Pp. 
ordinary correlation coefficient, r, for a sample of size N is given by 


N 
r( 
‘ 2 
Ir) = - ~ 
V ri = 


Therefore the first term in the expansion of P(r 2 ro) in powers of (1 — ro) is 


If the population mean is known to be zero, the frequency function of the 


NORE (N — 2) 41 — 1)-?” 


approximately 
P=2 


Hence 
P,/P = 2 oe 


which tends to zero as N tends to infinity. 
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DISTRIBUTIONS OF THE MEMBERS OF AN ORDERED 
SAMPLE 


By Cuarues E. Ciuark AND G. TrREvorR WILLIAMS 


Booz, Allen and Hamilton, and The Johns Hopkins University 


1. Introduction. Let the members of a random sample from a distribution 
F(x) with probability density F’(2) = f(x) be in order of magnitude x, --- , 
Im,°**,4n,°**, 2n, With % S 241,71 = 1,°--,N — 1, andm < n. We 
shall compute the moments of the distribution of z, and of the joint distribu- 
tion of z,, and x, . 

The results are derived under the assumption that /”'(x), the inverse of 
F(x), is a polynomial. Then we discuss the applicability of the results to any 
distribution for which F~'(x) is differentiable at m/(N + 1) and n/(N + 1). 
In this general case no restriction on F(x) is imposed other than the differ- 
entiability; in particular, the interval on which 0 < F(x) < 1 can be finite, 
semi-finite, or infinite. 


2. Present status of the problem. This problem is handled through analyses 
of several specific distributions in reference [1] listed at the end of this paper. 
It is suggested that any one of the Pearson type frequency cucves can be ade- 
quately approximated by one of the density functions handled in that paper. 
Although a general method is employed, there is no general development. or 
general results; each distribution requires special, extensive computations. In 
contrast to these earlier results, the present paper contains a general develop- 
ment with results that are easily specialized to particular distributions. 

Following [1] there have been discussions of asymptotic distributions. It is 
known that if m and N increase with m/N approaching a limit different. from 
zero and one, under quite general conditions the distribution of z,, is asymp- 
totically normal; see [2] or [3]. Also it was pointed out in [4] that with some 
restrictions on the distribution function the limiting distribution of x,, as N 
increases, but m is fixed, has the probability density 


m”™exp [my — exp (—y)]/(m — 1)! 


where y is a normalization of r,, ; see [5]. However, it is suggested in [6] that 
in the case of the normal distribution if m = 1, one should have a sample of 
size 10", and Mr. Kendall concludes in [5], p. 221, that “For practical pur- 
poses, therefore, there is still no adequate general approximate form for the 
distribution of mth values.’’ However, a contribution to the asymptotic case of 
this problem is made in [6]. In contrast to these asymptotic results, the present 
paper is concerned with the exact sampling distributions for any sample size. 
In the case of large samples, known approximations concerning moments are 
equivalent to the leading terms of some of the expansions of this paper. 
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3. The moments of the distribution of z,,. The probability density function 
of Sun is 


(1) [B(m, N — m + 1)]"[F(a)]""'U — F(a)" f(z) 


where the coefficient is the reciprocal of the beta function. 

We shall use the random variable ¢ = F(z,,) whose probability density func- 
tion is [B(m, N — m + 1)}‘t™ "(1 — t)*~". We denote the central moments 
of this distribution by »;, 7 = 0, 1, 2,---. Using p to denote the mean, we 
compute that p = m/(N + 1). 

At first we shall assume that the inverse of F(z) is 


(2) F"'(2) = Do ax — p)'. 


Later we shall remove the restriction that F‘(x) is a polynomial. 
The kth raw moment of the distribution of z,, immediately reduces to 


ue = (Bim, N — m+ 1)}- 7 [Fr lt Pt an dt, c 0, i, + 


For each k we can write as a finite sum 


a) 


where the coefficients b; are functions of a; and k. In this notation we have 


(3) is = : by, . 
We calculate that 


1 2 N! 1 al ” 
~ (;) (m — 1)N — m)! P) ; ) ‘ 


i 0 Ft tpregenn a 
iF - (—1 tis 

p > (: ys 1(N + 9). -(N +} “ome )) E ap ) ip’ + p)'. 

This expression will be reduced to a more convenient form. We use the identity 


m+A Aqp 
= | 
D(N + A + 1) . N 


and reduce v; to 


pe de-(C) TT (1+ 4), 


In this formula 


From this result we get vo = 
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Pd 
N+ 2’ 

») ( om 

=PdI\4 p) 
(N + 2)(N + 3) 


_ 3p “GN + Spq\2 2 — 5pq) 
(N + 2)(N + 3)(N + 4)’ 


20p" q “(q ~ p)N v 4pq(q = Pp) + Spq) 
(N + 2)(N + 3)(N + 4)(N + 5) 
l5p'g@N° + l0p'¢'(13 _ — )N + Spq(24 — Y4pg +: 37p G) 


(N + 2)(N + 3)(N + 4)(N + 5)(N + 6) 

We shall use the notation 2, = F'(p), and f‘? = f(z,). We can express 
the a; in (2) in terms of the derivatives of F(x) at x, by means of the relations 
between the derivatives of a function and its inverse. From the a; we calculate 
the b; , and with the use of (3) we get the raw moments. These include 

fi pq af” — ff" 2pq(q — p) 
fF N+2 a (N + 2)(N + 3) 
loss” — ff’ — 15f" = 3p*@’N + 3pq(2 — Spq) 
24f7 (N + 2)(N + 3)(N + 4) 
Here as elsewhere derivatives are denoted by primes and powers by arabic 


numerical exponents. Finally the central moments yu; are obtained, such as the 
following. 


+ 


py : . pes -- Pp) 
r+2 pf W+ 2 4: 


av + 3pq(2 — 5p) 
N + 2)(N + 3)(N + 4) 


“pu\4 —_ Ff . 3p q¢ N + 3pq(2 — 5pq) 
t (N + 2)(N +: 2f° (N + 2)(N + 3)(N + 4) 


a 3p GN +: Spq(2 — dpq) 


ft (N+2)(N + ay (N “+ 4) 
From these results we check the well known fact that if N increases with 
i/N fixed, the asymptotic distribution of 7,, has the mean and variance x, and 
pq/f N respectively (see [3]). Furthermore the known result that for large N 
the distribution is approximately normal is suggested by the following which 
are obtained from the leading terms of the above expressions. 
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bs \ - aves 4 


3/0 


M2 js 


uy 
Me 


We next discuss the applicability of the results to distributions for which 
F'(r) is not a polynomial. We note that the factor 


[F(x)\" "1 — F(x)\"~" 


in (1) assumes its maximum value at (m — 1)/(N — 1). Hence (1) indicates 
that the probability density of z,, is practically zero except in a small neigh- 
borhood of F-'[(m — 1)/(N — 1)].! Hence the moments of the distribution 
of x, can be determined with great accuracy from a knowledge of F(z) in a 
small neighborhood of F'[(m — 1)/(N — 1)]. But this knowledge of F(z) is 
given by « few derivatives of F(x) at x, because x, is near 


F"(m — 1)/(N — 1). 


In other words, the first few terms of the Taylor expansion of F~'(x) at x, 
should be enough to permit an accurate determination of the moments. Hence 
the above derivation holds with very little error if (2) is understood to be a 
few terms of the Taylor expansion. 


4. The median. The results simplify in the case N 2m + 1. We can com- 
pute that 


which is clearly zero when 7 is odd, reduces when 7 is even to 


m! 
2"41(7 + 1)(7 + 3)--- (9 + 2m 4+ 1) 


the reduction is achieved by the substitution of ¢ = sin’ @ and use of a known 
integral (see [7]). This reduces, after multiplication by B[(m + 1, m + 1)] ’ to 


1-3-5---(2t — 1) 


(2m + 3)(2m + 5)---(2m + 272 + 1) 


5. The efficiency of the median. As a numerical illustration we shall compute 
the efficiency of the median as an estimator of the mean of a normal distribu- 
tion. We consider g(x) = (27) e 7/2 and ¢ = ¢(0). The derivatives of F(x) 
at x 0 are calculated from those of g(x). Using (3) with k = 1, 2 and the 
formulas of section 4, we obtain the variance of the median of a sample of size 


' This statement is true even when m — 1 or N — mis small. If, for example, m 1 is 
small, F(r) < (im | N —1) for z < F“[(m — 1)/(N —1)], and [1 — F(x)}*~" 
is clearly small if z is at least a little greater than F-[(m — 1)/(N — 1)] 
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N = 2n + 1 in the form 


1 [ l 13 
402‘ ay! + or = Fre 
4y*(2n + 3) ; 4y*(2n + 5) Yig*(2n 4 5) (2n + 7} 


287 
+ 56880%(2n + 5)Qn + 7)Qn+o }, 


Since the sample mean is efficient, and since the variance of the sample mean is 
1/(2n + 1), if B(2n + 1) is the efficiency of the median, 


E(2n + 1) = [(2n + 1)uJ". 
Evaluating ¢ we obtain 


1 
E(2n + 1) 


F 4 1.5707963 5.3460357 26484528 + 
\ 2n+ 5 (2n + 5)(2n + 7)  (2n + 5)(2n + 7)(2n + 9) 


A tabulation of this four term approximation appears in Table I. 
The series for the reciprocal of the efficiency converges slowly for small 


2n + 1. 


In cases n = 1, 2, 3, the fourth term contributes 2.8% , 1.6%, 1.0%, respec- 
tively, of the tabulated value. To check the accuracy of the approximation we 
have calculated accurately (as described below) the reciprocal of the efficiency 
in cases n = 1, 2, 3. The values correct to three decimal places are given in the 
table. The relative errors are 5.6% , 2.2%, 1.1%, respectively. 


TABLE I 
Efficiency of the Median, Normal Distribution 


{E(QQn + 1)}”, four term in 41) ie 
approximation [E(2m + 1)}"', exact E(2m + 1) 


1.571 1.571 .637 
1.567 638 
1.564 .639 
1.557 .642 
1.549 .646 
1.538 .650 
1.503 .665* 
1.486 .673* 
1.457 1.473 679 
1.402 1.434 697 
1.270 1.346 743 


0 
2 
0 


The third decimal places in E(11) and E(Q) are in doubt. 
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The correct values of the reciprocal of the efficiency are obtained as follows. 
If n = 1, the reciprocal of the efficiency is, except for the factor 


(2n + 1)/B(2, 2) = 18, 
with F’(xz) = ¢(z), 


[ 2F(1 — Fly dz = [. F d(—zp + F) - [. F* d(—2y + F) 
i = [. (—2p + Pip dz — 1+ [. (—zp + F)2Fe dz 
estes 
—1/2 + 2/3 - af 
1/6 — (2x) [ce ws dx 


1/6 — . a= 
2rv 3 


g dx 


ri, 


Multiplying this last number by 3/B(2, 2) we get 


‘ /; 
] # 3V 3 
—— o = - 
E(3) 7 
1.346 
as given above. 
For n = 2, 3 the reciprocals of the efficiencies were calculated by numerical 


evaluation of 
(2 
2n + 1) f er — F)" 
Bin + 1,n + 1) 
6. The moments of the joint distribution of x,, and r,,m < n. We consider 
next the joint distribution of z,, and z, ,m < n. The probability density is 
N! 
(m — 1)!(n — m — 1)1(N — n)! 
(F(rm)\" TF (an) — Fam)" TL = Fan) f(a) f (an). 
The probability density of ¢ = F(z,,) and u = F(z,) is 
N! 
(m — 1)!(n — m — 1)(N — n)! 


- ‘n — t)" m Yd ~a en 
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The expected values of ¢ and u are pp = m/(N + 1) and p, = n/(N + 1) re- 
spectively. If vag is the expected value of (¢ — pn)*(u — Dn). we calculate that 


Pm Wm 


N+2 
Pm 4; 
ee” Fs Ft 
Pn Gr 
~ 543" 
20 Gu\da — Dad 
(N + 2)(N + 3)° 
: 2Dm Qn\Qm = Pm) 
on WN + DN + 3)’ 
2m Gn (Gn — Pn) 
(N + 2)(N + 3)’ 
27n Qn (Gn — Pn) 


(N + 2)(N + 3)’ 

say = 3PmdnN + 3pm dm (2 — 5Pm dm) 
= (N + 2)(N + 3)(N + 4) 

3D m Gm InN + 3pm Gn (2 — 5pm Gm) 


fas (N + 2)(N + 3)(N +4) ’ 
PmQnall — (Pm + Gn) + 3PmGnlN + Dm Gnl1¥+ 5(pm + Gn) — 15pm dnl 





me (N + 2)(N + 3)(N + 4) 
3PmPnGnN + 3pmGn(2 — 5pn qn) 

= as A mane le ’ 

P (N + 2)(N + 3)(N + 4) 
3pndnN + 3pnqn(2 — 5pn qn) 

Vo4 . ee 


(N + 2)(N + 3)(N + 4). 
If wos is the expected value of iz", , 
si 1 
Map = cit = — I du 
(m — 1)!(n — m — 1)UN — n)! Jo 
, l [FCF (uP (uu — O's — uu) edt. 
0 


Let the Taylor expansion 


[F'(t)|"(F"(u) PP = aon + a(t — pm) + an(u — pa) + aoo(t — pm)” 


+ an(t — pm)(u — pr) + an(u — pr) 4 
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be finite. Then 


/ 


Mas Avo + A29V20 +- Oy1V1) + Ave Ve2 + Az0V ac + ot 





The coefficients a,;; are expressed in terms of the derivatives of F(z) at F- (pm) 
and F~'(p,). 

As in the 1-dimensional case, if the Taylor expansion does not terminate, 
these results are approximations. 

As an illustration of the results obtained in this manner, the covariance of 
Zm and 2, reduces to 





/ 
Vix g5 I - * Pm J = 2Pm dn (Gm — Pm) 
my) n . 


Jain N +2 ~ «6 Of8f, (N + 2)(N + 3) 







, 


_ fun. 2pmgn(Gn — Dn) 
Qf fm (N + 2)(N + 3) 


+ 3f — Imm 3padmQnN + 3pmqn(2 — 5pm Gm) 


Ofmin (N + 2)(N + 3)(N + 4) 
+ hn —Infn  3PmPndnN + 3pm Gn(2 — 5Pn dn) 














6fifm = (N+ 2)(N+3)(N +4) 
a i Pm nll iad (Pm + Qn) + 3PmGnlN 
+ Sm Jn . + PmYntl - 5(pm + Qn) am 15Pm4n| 
4fnJs (N + 2)(N + 3)(N + 4) 








-~ Aa Ae + -+> 


where 





fO =f (p.)), t= 0,1, --- 


, 


4-2 —f. Pa =F pala — yp), 10s’ ~ ff" — 154" 
: 2f-7 N+2 6f® (N + 2)(N + 3) 24f? 
_ Bp N + 3pq(2 — 5pq) 
(N + 2)(N + 3)(N + 4)’ 
A,, is obtained from A by affixing the subscript m to every f, p, and gq, and 
A,, is obtained similarly. 


Using 2 as calculated above, we obtain from the last result the first two 
terms of the coefficient of linear correlation in the form 


1/2 ¢ \ 
—_ (Pmdn A | 
(ze, Za) = (= «) i! “sau 


/ 





in which 















fn it rt 
A= 174 Dm Ym — off? Pm dn + +f Pn Qn + 
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The following special cases are easily obtained. If f(x) = exp (—2), 
A = 3[pmqm Xp (2tm) — 2pmgn EXP (Xm + In) + PnGn exp (2z,)]. 
If f(z) = (27) exp (—2°/2), 


tn ImIXn rt a 
>» Pmdm — = Im Yn + » Pn dn 
er aa ae 


If f(z) = exp (—2x)x” ‘/T(r), 


A LUL(r)P((r — 1 — xm) tm” EXP (22m)Pmdm 


— 2r— 1 — am)(r — 1 — 2n)(Lmtn) “EXP (Tm + Ln) Pmn 
+ (r — 1 — ay) rn” exp (2tn)padn - 
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POWER FUNCTION CHARTS FOR SPECIFYING NUMBERS OF 
OBSERVATIONS IN ANALYSES OF VARIANCE OF FIXED 
EFFECTS 


By Leonarp 8. Fetpt anp Monarram W. Manmoup 

State University of Iowa and Egyptian Ministry of Education 
1. Summary. The charts presented in this paper are designed to facilitate 
the estimation of the number of observations per treatment required for analy- 
sis of variance tests of specified power. They are intended for use by experi- 
menters dealing with fixed treatments effects. With these charts the experi- 
menter may answer the following question: How many observations must I 
use per treatment to obtain a power of P; against alternative hypothesis H, ? 
Charts are presented for use with tests of treatments effects involving two to 
five levels of the treatment variable. The charts are strictly valid only for the 
completely randomized design; however they may be applied with relatively 
little error to tests of treatments effects in the randomized block and factorial 

designs, the latter employing a within-cells estimate of error variance. 


2. Nature of the charts. Charts are presented for a equal to .05 and .01 and 
k, the number of levels of the treatment variable, equal to 2, 3, 4 and 5. The 
charts are entered with the parameter \, which is defined as follows: 


a Do (uw — p)? 
ko? 


where yw, is the mean of treatment population j, «7 the mean of the combined 
treatment populations, 0 the population error variance, and k the number of 
treatments. The value of n, the number of observations which will be required 
per treatment for a test of specified power, is read directly from the ordinate 
of the appropriate chart. It is assumed that the same number of observations 
will be used in every treatment. The relation of \ to ¢, the parameter custom- 
arily employed in the definition of the power function of the F-test, is simply 


A = ov ‘n. 


3. Historical development. The first extensive tables of the power function 
of analysis of variance tests were published by Tang [5]. The tables given by 
Tang are designed in such a way that for fixed values of a, ¢, fi (degrees of 
freedom for treatments) and f2 (degrees of freedom for error) the probability of 
a Type II error may be determined. The interval of tabulation of Tang’s tables 
is .50, however, which is not sufficiently fine for accurate interpolation. 

Following Tang’s procedure, Lehmer [3] tabulated the values of @ for a = 
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.05 and .01, P = .7 and .8, over a wide range of f; and f.. These tables are 
quite complete within the power range considered, however they can not be 
conveniently used in the planning of experiments. From the tables the experi- 
menter can tell only that a projected test will have a power less than .7, be- 
tween .7 and .8, or greater than .8 against a specified alternative. 

Pearson and Hartley [4] presented families of power curves for various com- 
binations of a, f; , and fe which make possible a direct estimate of the power of 
analysis of variance tests. These curves, like the tables of Tang, are entered 
with the parameter ¢. For any given experimental setup, the power of the test 
may be read directly from the ordinate of the curve. These charts are well 
suited to the evaluation of the power of any given test. They can not be easily 
employed, however, to indicate the value of n which should be adopted in order 
to secure a specified power. For this purpose, the experimenter must adopt the 
relatively inefficient approach of making repeated approximations until the 
ralue of n has been estimated with sufficient accuracy. 

Fox [2] contributed charts which facilitate the determination of sample size. 
These charts were constructed from the tables of Tang and Lehmer and are 
essentially graphs of constant ¢ for varying values of f; and fe. By a method 
of successive approximations, the value of n may be determined for a fixed 
value of @ and a fixed value of P against a specified alternative. These charts 
are somewhat more convenient than the curves of Pearson and Hartley for this 
purpose, but they are somewhat laborious to use because of the iterative nature 
of the method of approximating n. Also, the charts do not extend below f; = 5. 
For experimenters dealing with fixed treatments effects, this limitation con- 
siderably restricts their usefulness. 

Duncan [1] published a special condensation of the Pearson and Hartley 
charts. He plotted on a single set of axes the values of ¢ corresponding to ? 
.50 and .90 for various values of f; and fe. Separate charts are presented for a 
.05 and .01. Having fi; and f. on the same chart facilitates computations which 
involve both of these elements. For use in planning experiments, however, 
these charts are subject to the same weaknesses as those of Pearson and Hartley 

Though several types of charts and tables of the power function of F-tests 
have been published, none permits a direct, non-iterative approximation for 
the number of observations required for a test of specified power. The charts 
presented in this paper make possible such an approximation for experiments 
which include 2 to 5 levels of the treatment variable. 


4. Construction of the charts. Mach chart presents, for a 05 and .Ol, a 
family of five curves which correspond to the following values of P: .5, .7, 8, 


9 and .95. The number of observations per treatment (n) is plotted on the 
ordinate, the value of \ is plotted on the abscissa. 


The numerical calculations for the coordinates of the points on the curves .7 
and .8 were carried out from the tables of Lehmer; the calculations for the re- 
maining curves were based on data read from the charts of Pearson and Hart- 
ley. The three basic steps in the calculations were as follows: 
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Determine (from table or chart) pairs of values for @ and f2 for specified 
value of P, f; and a. 
Solve f. for n from the relationship n = 1 + f2/k, where k is the number 
of treatments and n the number of observations per treatment. 
(3) Divide ¢ by Vn to obtain X. 
The pairs of coordinates, n and \, were then plotted and smooth curves fitted 
through these points. 


5. Example. An experimenter wishes to investigate the legibility of two 
common styles of handwriting: manuscript and cursive. These styles, which 
constitute the two “levels” of the treatment variable, are to be compared for a 
population of fourth grade children. The measure of legibility to be employed 
is based on the number of regressions in the eye movements of adult readers as 
they read a standard passage written in one or the other of these styles. Previ- 


ous research with this measure has given rise to an error variance of 10.00, an 
estimate which may be taken as a population value for this purpose. The com- 
pletely randomized design is to be used. For a difference of 3.0 between the 
population means, the experimenter wishes the power of the F-test to equal 
.90. The .05 level of significance has been adopted. 

For this situation 


/ 


/z (uj; - pu)? (1.5)? + (1.5)? - 
V ko? V 2(10) 
Iwntering Figure 1 with this value, and using the curve for P = .90, a 
we read the required number of observations per treatment to equal 24+ 








3 4 oS d ar d ‘ ‘ | e— FOR a +05 
FOR@*0l — 2 a : é d ‘ 7 3 1.0 





Fic. 1. Curves of constant power (P) for the test of main effects w th k=2 





874 LEONARD S. FELDT AND MOHARRAM W. MAHMOUD 





P= 5 .7.8.995) 





















FOR a #05 
me) hi » 


Fig. 2. Curves of constant power (P) for the test of main effects with k=3. 









nf 
[P=5 7.6.9.95 





heen 























2 3 4 5 6 - 8 9 10. Ll <— FOR a-05 X 
FOR a =O! —— .2 a 4 so 6 a 8 ] 1.0 Nl 


Fig. 3. Curves of constant power (P) for the test of main effects with k=4. 
} 


In this example the difference between the population means and the error 
variance were separately specified. It is often the case, however, that the al- 
ternative hypothesis can be defined as a proportion of the error variance. For 
example, the experimenter might desire a certain power against the alternative 
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5.78.9.95 








2 2 4 | j , J 3 . Ll — FOR a +95 
FOR a *0li —- 2 ° ; ¢ . d 8 » 1.0 Ll 


Fic. 4. Curves of constant power (P) for the test of main effects with k=5. 
(uj — u)” 2 
X —— = .100. 


In this case 


™ > MH — ¥)’ — 4/10 = 32. 
= 


The value of \ is thus specified without an explicit statement of the absolute 
differences between treatment population means. 


6. Note. Steps 2 and 3 in the derivation of these charts are based on the re- 
lationship which holds between f, and n for the completely randomized design. 
Since this relationship varies from one experimental design to another, these 
charts are strictly valid only for the completely randomized setup. For pre- 
cisely accurate determination of the value of n in any other design, a unique 
set of charts for that design would be required. Charts for the randomized 
block design, for example, would be based on the relationship 


Charts for the test of the factor with k levels in the k X h factorial design would 
be based on the relationship 
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fe = k(n — h) 


n=h+ 


fe 
k’ 


However, from charts specifically constructed for each of these designs it 
was found that when k(n — 1) 2 20 the relationship between Xd and n is al- 
most identical for all three designs. Little inaccuracy results from the applica- 
tion of charts based upon the relationship which holds for the completely ran- 
domized design. 

The relatively small error involved in using the present charts for planning 
randomized block and factorial experiments is demonstrated by the values in 
Table 1. In this table the appropriate numbers of observations are indicated 
for selected experimental conditions involving the three types of designs. The 
values of n for the randomized block and factorial designs were derived from 
the charts specially constructed for these designs. It may be seen that in every 
instance the value of n read from charts constructed for the completely ran- 
domized design is only slightly smaller than that read from charts specific to 
the other designs. The underestimate is less than one observation in almost 


TABLE 1 


Comparative Values of n for Completely Randomized, Randomized 
Block, and Factorial Experiments (a = .05) 


n 


Completely Randomized <a 
s 7 a b act a l 
Randomized Block k X k Factoria 


8.0 8.9 
16.0 16.9 
30.0 30.9 
60. 60. 


9. 


i. 
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all cases. Therefore, for the practical purpose of approximating the necessary 
number of observations per treatment in randomized block and factorial ex- 
periments, it would seem sufficiently precise to use values read from the present 
charts. 
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LIMITING DISTRIBUTIONS IN SOME OCCUPANCY 
PROBLEMS''’’ 


By Irvine WEIss 
Bell Telephone Laboratories, North Andover, Mass. 


SecTION I 


Introduction. The classical occupancy problem is concerned with the random 
distribution of a specified number of objects (r) in a given number of cells (N). 
No restriction is placed on the number of objects in any cell other than that 
the total number of objects equals r. The problem of finding exactly m cells 
empty for the case with r and N finite, and with all arrangements of r objects 
having equal probability can be expressed in closed form [1]. However, for 
large N, use of this formula for computation becomes exceedingly tedious. 
Several authors, [2] and [3] have stated without proof that under suitable re- 
strictions on N, r the limiting distribution of the number of unoccupied cells 
as N, r approach infinity is normal. 

By imposing the restriction a = r/N, a > 0, it will be shown that in the 
above occupancy problem the asymptotic distribution of the number of unoc- 
cupied cells is normal. 

A modification of the above occupancy problem is the following: g objects 
are randomly distributed among N cells such that no more than one object is 
in any cell. The procedure is repeated w times. For example, with w = k, the 
maximum number of objects in any cell is k, one for each of k trials. It can be 
shown that by restricting gw = aN, a > 0, the normal asymptotic result given 
above holds. Also, by imposing the restriction gw = N log N/\ the number of 
unoccupied cells has asymptotically a Poisson distribution. This is an exten- 
sion of the same results listed by Feller [1] for the classical occupancy problem. 
Proofs for the modified occupancy problem have been given by the author [7] 
and will not be given in this paper. 


2. Outline of proof. In showing asymptotic normality our method will em- 
ploy moments. We show that the moments converge to the moments of the 
normal distribution. From this it follows (by a theorem in Uspensky [4]) that 
the distribution of our random variable converges uniformly to the normal 
distribution. 


3. Main results. With a = r/N, a > 0, we define a random variable X,; as 
follows: 


X; = 1. if cell 7 is unoccupied after r tosses. 


I] 


0 otherwise. 
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Assuming all N events are equally likely and that the r trials are independent 


E( ¥ X ** = ) om ¢ 8 ) r 
ba his = _ * 


Let X equal the number of unoccupied cells 


N 
x=) xX; 


t=—1 


] r aN 
1X) =N(1— =N(1-=— 
E(X) =N (1 ‘) \ ¢ x) 

lim —— 
ao 


As N becomes infinite, E(X) becomes infinite but E(X)/N approaches a finite 
limit. 

We will prove that the random variable X has an asymptotically normal 
distribution by showing that 


lim —"* 
New (07)? 


= 1-3---(k —1) for k even 
=() fork odd 
The general kth moment, yx , is 
~ r k , -—T , , r 
us = E(X — E(X))* = > (-1) (‘) ex’ )(E(X))’. 


r=) 


As shown in Theorem 1 of Section II, by using Stirling numbers of the first 
and second kind, yu, can be expressed as follows: 


k k-—-r p i - aN aNr 
m= DEE Cow'(")wrsysr.(1-B) -f)". 
r=0 p=1l j=1 r N N 


It can be shown (see [5]) that 


(1 = 2" (1 BY" = ept-oln + olen] ete 
N N *E P *P1L-& G+ pn |’ 


Now 


exp | -2 ov + e | = > Zz a+ --as" J 


Wat DNe rn 
t—1 (t + 1)N {mss Deqime=ny Mal Me! N 


Cee 62. 
n=0 N - 
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where 


. “+8 7 
+1 
Substituting above and noting that S), = S:_, = 0 we have 


n= DoD (‘)s sate F KG, 9) 7 


r=0 s=—r a0 





a; 


where 
pt+r=s 
i+ 


ll 
e 


Collecting like powers of N 


k 
| e 'bi.v.0 + a Dy vga totes He "Dee, | 


s=r+1 


E 
lI 


k 
(1) = w|Ee (Zp. wanes 
vk 2 n=0 
N° ya [ Dem 100) | + R(r, N) 
where 
= >) (-1)’ On-. — r) 
r=() yr 

= A's F(()) 
and 

ik/2] = k/2 for k even 

= k —1 for /: odd 
2 


As shown in [6] bs..4n,. is the ‘th difference of f(0). By Lemma 1, 
Beas 0 for » > k/2 


ck! for v k/2 


where c is the product of the coefficients of the highest degree terms in r of 
Sir, S:—r and K,,(r, 8). 
For a given k, R(r, N) is a bounded function of r and N. This is an immediate 


consequence of the analyticity of y,. From (1), the highest power of N in 
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R(r, N) is N“”'. Therefore 
R(r, N) = O(N™”!*), 


Incorporating these results in (1) for k even 


” 
mw, = N‘? z. e “a jk) + O(N" — 
a=k /2 


where 
a—k/2 a—k/2 
a,(k) = _ by e/2enin = DH ck! 


n=0 n=) 


Using Lemma 1, it follows that 


a,(I) = Dyy20(—1)"(a + 1)" ey 
where 
Dij2o = 1:3 +++ (k — 1) 
h=s8s-—k/2 
Substituting above 
we = N*?(e-* — (a + 1)" Dao + O(N") 
Noting that o° = we, forming the ratio 


_ D, aa tg * — (% =p le *)* . ot O(N*?*") 


(o?)#/? ~ Ne 2(¢ —2 —_ (a + le") 2 + O(N 2) 
dividing numerator and denominator by N*” and then letting N — « 
-™ ea te Dijo for k = 2,4,--- 


‘or k an odd positive integer, 


lim _** - = (). 


now (o7)*/? 


This follows from the fact that b,.4n. = 0 for v = k/2 as v being a positive 
integer cannot equal //2. Therefore, 


mw = O(N“-””) 


while 
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Section II 
THEOREM Il. 


k k—r p 


m= 2 > (-0"( 


iii a Pp aN 1 aNr 
N Sp Si—, 1 — V l — . 


where Sj, and St_, are Stirling numbers of the first and second kind respectively 
PROOF. 


r=0 p=1 j=1 7 


(2) 


k 


we = E(X — E(X))* = 0 (-v) (‘) E(X*)[E(X)’. 


By the multinomial expansion 


with 
A(s;) = 1 for 8s; >0 
= (0 for s, = 0 
and 
N 
> Xs) = p l<pées 
t=1 
Xi = Xi? as X; = Oor 1. 
we have 
rs - s! N rr(a1) rh(ay 
X = 2 » ' ( ) xyes xy 
P=l (4,:5N r(e)—p 85° °°Sy! \P 
j “8+ det gan] s 
} >¥ 1ai=8 { 
From Jordan [5} 
cP s! l - 
‘ Pp! ( es:8;>0) S&1!---s,! 
\ Sp 


(ae 288) 


Substituting above to eliminate the second summation and taking expectations, 


K(X") = p:(*) sreci.-.xe™) 
p=l P 


> wv), (1 - ey" s? 


Pp 
(N), = 2 gn’ 
J 


From Jordan [5] 
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Substituting above in (2) with 
1 aNr 
E(X "= N’ 1 _- — 
[ )) ( x) 


THEOREM 2. The degree of K,(r, 8) defined in equation (1), considered as a poly- 
nomial in r, is oblained from the lerm of the summation in which m, = n and 
Me = Mz = --- = m, = O. 


yields the desired result. 


Proor. The highest power of r in a; is 7 + 1. For a given n we have to de- 
termine m,, --- , m, which will maximize the highest power of r subject to the 
restriction that 


7 im, = n. 
wl 


Maximizing the power of r is equivalent to maximizing 


2m, + 3m. +--+ + (n+ Im, =n t+ =. Mm; 
i=l 


Maximizing >>? 1m, subject to the above restraint yields 
Zz m=n— Z (2 — 1)m, 
sel i=2 


The maximum is attained when m, = 0; 7 = 2,---, n. Therefore, the power 
of r is maximized when m, = n. From the definition of a; , it is readily seen that 
the degree of r in K,(r, s) is 2n and that the coefficient of this highest degree 
term is (—a@)"/2"n!. 
LemMMA 1. Let b..ein.n be defined as above. Then 
b, ttnn = 0 for v> k/2 
= ck! for v= k/2 


whe re 


— teu 2—n) ,0 Dx~),0 (—a)" 
(2(s — k/2—n]! fe—s)|! nt2 

Proor. From Jordan [5], S, ” and 8;~™ are polynomials in n of degree 2m, 
1.€.: 


(n)on : 7 
Cas (3; y! + terms in n of degrees less than 2m 
2m)! 


(n) om 
(2m)! 


Dano + terms in n of degrees less than 2m 
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where 


Cae = (— i"Bas 


As the product of a finite number of polynomials is also a polynomial, 


,vtn 


Ss- r "Si - -K,(r, 8) 


is also a polynomial. Its degree in r for fixed v, s, n is 


2(s — v — n) + 2k — 8) + 2n = 2k — v) 
It follows from elementary properties from the calculus of finite differences that 
brcann =O for v>k/2 
ck! for v= k/2 


: _" - : rp : 

where c is the coefficient of r° in the product polynomial. That ¢ is the product 
of the above three factors is apparent from the polynomial expansion of Stirling 
numbers and from Theorem 2. 


Acknowledgment. I am deeply indebted to Dr. Joseph F. Daly whose helpful 
guidance made this work possible. 
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AN EXTENSION OF BOX’S RESULTS ON THE USE OF THE 
F DISTRIBUTION IN MULTIVARIATE ANALYSIS 


By StyMour GEISSER AND SAMUEL W. GREENHOUSE 


National Institute of Mental Health 


1. Introduction and summary. The mixed model in a 2-way analysis of vari- 
ance is characterized by a fixed classification, e.g., treatments, and a random 
classification, e.g., plots or individuals. If we consider k different treatments 
each applied to everyone of n individuals, and assume the usual analysis of 
variance assumptions of uncorrelated errors, equal variances and normality, 
an appropriate analysis for the set of nk observations r;;, 7 = 1, 2,--- Nn, 
j@mi,2---k, ts 


Source D.F. F 


_ mean square for treatments 
Treatments k-1 naan ae & 
Individuals n—1 ne yes See's" ® 
Treat. X Ind. (k — 1)(n — 1) 


where the /’ ratio under the null hypothesis has the F distribution with (k — 1) 
and (k — 1)(m — 1) degrees of freedom. As is well known, if we extend the 
situation so that the errors have equal correlations instead of being uncorre- 
lated, the F’ ratio has the same distribution. Under the null hypothesis, the 
numerator estimates the same quantity as the denominator, namely, (1 — p)o’, 
where p is the constant correlation coefficient among the treatments. This case 
can also be considered as a sampling of n vectors (individuals) from a k-variate 
normal population with variance-covariance matrix 


p eee p\ 


1 
Vee : 


p 

ps pl 
If we consider this type of formulation and suppose the k treatment errors 
to have a multivariate normal distribution with unknown variance-covariance 
matrix (the same for each individual), then the usual test described above is 
valid for k 2. For k > 2, and n 2 k, Hotelling’s 7° is the appropriate test 
for the homogeneity of the treatment means. However, the working statistician 
is sometimes confronted with the case where k > n, or he does not have the 
adequate means for computing large order inverse matrices and would therefore 
like to use the original test ratio which in general does not have the requisite 
F distribution. Box [1] and [2] has given an approximate distribution of the test 
ratio to be F{(k — Le, (k — 1)(n — 1)e] where e¢ is a function of the popula- 
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tion variances and covariances and may further be approximated by the sample 
variances and covariances. We show in Section 3 that « = (k — 1)’, and there- 
fore a conservative test would be F(1, n — 1). 

Box referred only to one group of n individuals. We shall extend his results 
to a frequently occurring case, namely, the analysis of g groups where the ath 
group has n, individuals, a = 1, 2,---g, and as Na = N. We will show 
that the treatment mean square and the treatment X group interaction can be 
tested in the same approximate fashion by using the Box procedure. 


2. Extension to g groups. Consider a mixed model, k treatments, each applied 
to N individuals where the N individuals are subdivided into g groups so that 
we have chosen a random sample of n. individuals from the ath group. The 
observations are tija,% = 1,°+:Ma,j = 1,°*+,k, a = 1,-++,g and 


= Na = N. 
| 


Therefore we get the following array for the ath group 


Vila *** Lika 


Lngla *** Inka 
We may consider the joint distribution of the 2,;. to be represented by the 
vector variable 
z' = (a1 °o° Yur °° Xn °° ° Dahl ° °° Lig °° * Likg °°° Lnglo eee In,ko) 
where Ex’ = yp’ and 2’ hasa kN multivariate normal distribution with variance- 
covariance matrix 


iV, 0 
0 


where V is a matrix of order k, V, is of order kn, and A is of order kN. 
Let Exija = uj and 


Q 
N™ 7 Nabja = uj, is the mean of the jth treatment, 
a=l 


k 


wus ° ° 
k 2s Hija = H.a iS the mean of the ath group, and 
j=1 


k a 
3 1 
k z Mj. N >. Nabtea = p.. the grand mean. 
j=1 a=l 
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TABLE 1 


Source D.F. 


Treatments k-1 Fy, = (N — g)Q:/Q: 
Groups g-1 F, = (N — g)Q:/(g — 1)Q; 
Ind. Within Groups N-g 

Treat. X Groups (k — 1)(g — 1) F; = (N — g)Q.)/(g — 1)Q@s 
Treat. X Ind. Within Groups. (k — 1)(N — g) 

Total x Nk -—1 


We will now partition the total sum of squares into 5 constituent sums of 
squares, as one would usually do with a mixed model that satisfied all the usual 
analysis of variance assumptions. 

Let S be defined as the matrix of the quadratic form representing the correc- 
tion factor which is the square of the grand total of all the observations divided 
by kN. S isa kN X kN matrix whose elements are all (kN). Further let a 
matrix M of sub-matrices M,s be denoted as 


i ae 
{Mas} =| : : 
ee se: le 


If M.s = 0, for a ¥ 8, let the resulting matrix be denoted by 


M, O :-:- Q | 
0 : 
: 0 

0 «-- @ XH, 
Now let 


k 


Q, = 27/Ar=N (Z%.;. — #...)° 
7 


I 


and 
A= {N ‘A as} iad S, 
where A, is the matrix of na X ng matrices each of which is the k X k identity 
matrix. 
Let 


Q, =2'Bz = k yi Na(Z..q — &...)’, 
a@=l 


where B = {nz'B,} — S and B, is the matrix of nz X na matrices each of 
which is of order k X k, and is of the form 


E=k"1Ai, 


where 1; = (1, ---, 1). 
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Qs = 2¢Ozr = k > = (Z:.0 =z. a); 


Gel i=l 
where C = {nz’C.} and Ca = naE« — Ba, where 
EO --- Q 


Ka 0 


dg 
= 7’Dx = 2 Na 
a=l 
where 


D = {nz'(Aa — Ba)} + {N"(Bas — Acs)}; 


and here Mag is a matrix of mz X mg matrices each of order k where here A ag 
refers to the matrix of identity matrices and Bag refers to matrices of F’s 
Let 


Qs = x’Fx = > > > (Zija — L..ja ee 


a=] j=l t=1 
where F = {na Fa} and F, = nla — Ea) + Ba — Aa, Where 
I-—E .:--: 0 
— E, = : : 
0 --+ J—RF 


Now it is easy to show even for arbitrary V, the basic matrix of A, that 
4AF = DAF = BAC = BAF = CAD = 0. 


Hence by a result due to Carpenter [8], Q: and Q, are independent of Qs , Qe 


is independent of Q; and Qs; , and Q; is independent of Q,. Further as Box has 
shown if Q = (a — p)/M(x — yw) where x’ has variance-covariance matrix A, 
98-1 


the sth cumulant of Q, K,(Q) = 2°"(s — 1)! Tr (AM)* where Tr stands for the 
trace of a matrix. Hence by straightforward algebra we get 


K,(Q:) = Tr V —TrEV+N > Wy. —2..)?, 
1 


K; (Qs) (g -—1) Tr EV + >» Na (bea — ae 
Ky (Qs) = (N = g) Tr EV, 


Ki(Q;) = ¢g — 1)(Tr V — Tr EV) + 3 ne Do (ja — By — Bea + Ke.) 
a=l pl 
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K, (Qs) = (N — g) (Tr V — Tr EV), 

and 

K2(Q;) = 2 Tr (AA)’ = 2 Tr (V — EV) if uj. = w.., 
K:(Q2) = 2 Tr (AB)’ = 2(g — 1) Tr (EV) if ui. = ws 
K:(Q:) = 2 Tr (AC)? = 2(N — g) Tr (EVY’, 

K.(Qs) = 2 Tr (AD) = 2(g — 1) Tr (V — EV)’ if uja — wy. — Ha + w.. = 0; 
Kx(Qs) = 2 Tr (AF)’ = 2(N — g) Tr (V — EVY’. 


oe 9 


From the first cumulants it is clear that under the null hypothesis of no treat- 
ment differences, the Expected Mean Square (E.M.S.) for (k — 1)7Q: is (k — 1)™ 

(Tr V — Tr EV); under the null hypothesis of no group * treatment interac- 
tion, the E.M.S. of (g — 1)"(k — 1)"Q is (k — 1)"(Tr V — Tr EV), while 
the E.M.S. of (N — g)"(k — 1)7’Qs is just (k — 1)'(Tr V — Tr EV). Hence 
under the hypothesis that the treatment means are equal, the numerator and 
denominator of F', estimate the same quantity; and under the hypothesis of no 
interaction, the numerator and denominator of F; estimate the same quantity. 
Similarly under the hypothesis of no group differences, the numerator and de- 
nominator of F, estimate the same quantity. 

Now using the results of Box ({1], Theorem 6.1) on the approximate distribu- 
tion of linear sums of chi-square variates, it is clear that F; is approximately 
distributed like F[(k — l)e, (k — 1)(N — g)e] and F3 is approximately like 


Fl(g — 1)(k — le, (k — 1)(N — g)el| while it is obvious that F: is exactly 
distributed like F(g — 1, N — g), where (Box [2]) 


= k* (ou — b..)?/(k — 1) )(E Let, - 28 


s=1 


and v,, are the elements of V, d,,; is the mean of the diagonal terms, d;. is the mean 
of the ‘th row (or ‘th column) and @., is the grand mean. This result is easily 
extended to the fixed interactions in an r-way classification where one of the ways 
is individuals divided into g-groups and the other r — 1 classifications are fixed, 


3. A lower bound on «. Clearly, the formulation of the degrees of freedom 
with which we enter the F-table requires the computation of the elements of 
the variance-covariance matrix. We now present a lower limit on « independent 
of these elements. This limit, although obvious and simple, may be too con- 
servative. 

From Theorem 6.1 Box [1], it is easy to show that 


= (k — 1)"{Tr (V — EV)}'/ Tr (V — EV)’, 


€ 
k 2 

sa hea v*(Ea) /2M, 
j=1 j=l 


where A;(j = 1 - k) are the latent roots of (V — EV) and are non-negative. 
But (3°, = : Xj. Therefore « => (k — 1). Hence, F; is conservatively 
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distributed like F(1, N — g) and F; is conservatively distributed like F(g — 1, 
N — qg). We also note that if V = o°J (the usual analysis of variance assump- 
tion) all the roots of V — EV are equal except for one which is equal to zero so 
that « = 1 in this case. 


4. A joint test of groups and treatment X group interaction. In psychological 
problems it is sometimes necessary to test whether several groups form one 
cluster. This is equivalent to testing jointly groups and group X treatment 
interaction. The proposed test here is 


Fo = (N — g)Q’/(g — 1)Q, 
where 
Q’ = Q.+ QQ: and Q = 03+ Q;. 


It is clear from Section 2 that the numerator and denominator are independent 


and 


Rie - On + TE able. —0F +E o 
a=l a=l 


k 


- DY (Hie — wy. — Bea + w..)*, Ki (Q) = (N — g) TV; 


and if wa = B..,Kja — Wj. — Ma tu... = O, then 
K(Q’) = 2g — 1) Tr VW," 
K.(Q) = 2(N — g) Tr V’, 


and again by using (Theorem 6.1 [1]), Fe is approximately distributed like 
F\(g — 1)ke’, (N — qg)ke’|, where 


com f/$ba.. 


l'urther it is easy to show that « = k”' independent of the population vari- 
ances and covariances and a conservative test would be F(g — 1, N — q). 
The rationale for this test is that the numerator and denominator of /’y estimate 
the same quantity under the null hypothesis of no group effects and no treat- 
ment X group effects. 

It is of interest to point out and make more explicit the relationship bet ween 
the foregoing discussion and the general hypothesis in multivariate analysis of 
the equality of vector means among g populations where all the variables are 
measured in the same metric. This latter is 


Hy (ps a Pes My), 


where wa = (Mia, Maa, *** 5 Mea) IS the vector mean of the ath group (i.e., multi- 
variate normal population). But the joint test on groups and group xX treat- 
ment interaction just presented is in effect also a test for the equality of the g 
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vector means, since the joint null hypothesis of no interaction and equal group 
means is equivalent to 


Hja —~ Bj. ~ Boa + p.. = 0 for all j, Q, 
Mia = bb for all a, 


which is easily seen to imply wj;2 = uj. for all a, which is equivalent tow. = we = 

= w,. Therefore, if the variance-covariance matrices in the g groups can 
be assumed equal, an approximate test on the hypothesis of equal vector means 
in multivariate analysis is the Fy test with « approximated from the sample 
variances and covariances. It is clear that the conservative F-test which is 
independent of ¢ can also be used in this case. Furthermore we shall show that 
if the variance-covariance matrices are not assumed equal, the conservative 
F-test can be used with the restriction that n, n. 


5. Remarks on unequal variance-covariance matrices. One of the basic as- 
sumptions was that each of the N individuals had the same variance-covari- 
ance matrix. However if na = n fora = 1, --- g, then we need only assume that 
individuals in the same group have the same variance-covariance matrix while 
these variance-covariance matrices may vary from group to group. In this case 
we get unbiased numerators and denominators of the test ratios as before and 
the same approximate distributions can be derived, but now the numerator and 
denominator degrees of freedom have different adjustment factors, each depend- 
ing upon the different covariance matrices. However it can be easily shown that 
the lower bounds on these e’s are such that Fy, F;, F2, and F3 all have the 
same conservative F-test, namely, F(1,n — 1). 


6. Acknowledgment. We are indebted to the referee for detecting an error 
and for several suggestions which have improved the exposition. 
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A PROPERTY OF ADDITIVELY CLOSED FAMILIES OF DISTRIBUTIONS 
By Epwin L. Crow 


Boulder Laboratories, National Bureau of Standards 


1. Introduction. The property that a linear combination of independent x? 
variables with coefficients other than unity (or zero) is not distributed as x? 
has for long been tacitly understood or explicitly stated in studies of the distri- 
bution of quadratic forms, the Behrens-Fisher problem, and the precision of 
estimates of variance components, and in the derivation of tests for the analysis 
of variance of unbalanced designs. The earliest explicit statement known to the 
author occurs as a special case of a corollary given without proof by Cochran 
[2]. A proof depending on the form of the moment-generating function of x? 
was given by James [6]. The purpose of this note is to state and prove the analo- 
gous property for a general class of closed families of distributions, on the basis 
of work by Teicher [8]. 

DEFINITION. A one-parameter family of univariate cumulative distribution 
functions F(x; d) is additively closed, if, for any two members F(x; A) and F(x; dz), 


F(a; A) . F(a; de) = F(a; Ai + Ao). 


2. Principal Result. 

THEOREM. Consider a one-parameter additively closed family of univariate 
cumulative distribution functions F(x; \), where d is (i) any positive integer, (ii) 
any positive rational, or (iii) any positive real number (except that in case (iii) it ts 
required that (t; X), the characteristic function of F(x; d), be either continuous in 
d or real-valued for real t). Let three cumulants with orders j, 7 + h,j + 2h (j,h 
positive integers) exist and be non-zero. If j is odd or both j and h are even, also let 
F(x;) = 0 for x < O and F(x;X) > 0 for x > 0. Then the only linear combina- 
tions of a finite number of independent variables with distributions in the family, 
ta CyX,, (c, ¥ O, real), whose distributions are also in the family are those 
with all c, = 1. 

Proor. According to Theorems 1 and 2 of [8] the characteristic function of a 
member of the family is of the form [f(t)]’, where f(t) is a characteristic function 
not depending on \. Let A, be the value of the parameter of the distribution of 
Me. EE il c,X, is to have its distribution in the family for some \, then 


(1) I] [fc OP = [fOP. 


Since the cumulants of order through 7 + 2h = m exist, 


m 


(2) log f(t) = - " (it)” + o(t”) 


y=l 
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in some neighborhood of 4 = 0, where the x, are the cumulants of the distri 
bution corresponding to f(t). Hence 


m 
1 


a m 
> + DL ~ (iet)’ + oft”) =x Dd, ™ (it)’ + oft”), 
re] ’ Vv. yea) Vs 

Or 


m 


k 

. hy ° ’ r 

(3) > = it)’ (dora — vA) + 0% =0. 
ye] V. r=1 

For this to be true as ¢ approaches zero the coefficients of ¢, f, --- , t” must be 

zero. Since «x; , Kj4,, and xj42, are not zero, 


(4) > vc} =X, > ret = i, > rv c2*™* = x. 
r=1 r= r=1 

Multiplying both sides of the second equation by 2 and subtracting them re- 
spectively from the sums of the corresponding sides of the first and third equa- 
tions’ give 


kL 
(5) ZZ el — cy" = (). 
r= 


Since A, > 0 and the c, are real and not zero, this equation and an even value of 
j imply that c? = 1. If in addition h is odd, then c, = 1. If both j and h are even 
or if j is odd, then the conditions F(x; 4) = 0 for x < 0 and F(z; A) > 0 for 
x > Oimply that c, > 0 as shown below. Hence in these cases also it follows that 
all c, are unity. 

To show that the conditions F(x; A) = 0 for z < 0 and F(z;\) > Oforz > 0 
imply that no c, can be negative, we first note that if all c, were negative then 
=c,X, would be negative with probability one and hence could not have its 
distribution in the family. We therefore suppose that there are exactly p nega- 
tive values of c, with 0 < p < k, say. ,c2, +--+ ,¢p. Let 

P k 
(6) X=-DieX,, Y= Dd oX,. 


r=pt+l 


The cumulative distribution functions of X and Y are, say, 


G(x) = r(- ~ >) 44% r(- co Ds), 
C1 Cp 
H(y) = r( : Ayan toot F (2 mn), 
Cp+t Ck 


and thus possess the properties of F(x; \): 


G(x) = Oforz < 0, G(x) > Oforz > 0; 


(7) 


(8) 
H(y) = 0 for y < 0, H(y) > 0 for y > 0. 


1 This method of combination, simpler than that used initially, was pointed out by Pro- 
fessor Arne Magnus. 
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Then 


a& 


(9) Pr (> Ga = 0) =Priy < 3)= | H(x) dG(x). 
r=l1 0 


G(x) is not a degenerate distribution since c, ~ 0 and its second cumulant, for 
example, is not zero. Hence G(x) has a positive increase over some interval in 
which H(z) > 0. Hence there is a positive probability that =c,X, < 0. But this 
is impossible for any member of the family. Hence no c, can be negative. 


3. Discussion of theorem. The requirement that the initial point of increase 
of the distributions be zero can be dropped by restricting consideration to 
positive ¢, . 

The theorem is satisfied with a minimum number of cumulants required if 
the first three—-the mean, the variance, and the ‘‘skewness” measure x;— are 
not zero, provided that (2, 4) = O for z < Oand F(x, A) > O for xz > 0. Beyond 
this proviso, only the requirement x; # 0 need be stated explicitly since this 
implies xo ~ 0, and since a non-negative, non-degenerate random variable must 
have x; ¥ 0. However, the condition «x; ~ 0 is not necessary for the conclusion 
of the theorem, as shown by the example of the additively closed family of 
binomial distributions with p = 3 and parameter the sample size. Although 
x; = 0 in this case, the theorem applies with x2, x4, and x¢ all non-zero. 

If the three non-zero cumulants used in the theorem include x2 , it need not be 
explicitly stated that x. ¥ O since xj42, ~ O implies x. ¥ 0. 

A requirement that x, # 0 would by itself exclude two cases for which the 
conclusion of the theorem is false: normal with mean zero and variance \; 
Cauchy with median zero and semi-interquartile range \. However, even x # 0 
and the further conditions ko # 0 and c, > 0 are not sufficient for the conclusion 
of the theorem. Consider the one-parameter family of normal distributions with 
variance \ and mean yA, where \ > O andy # 0. The distribution of ¢,:Y, + c.X» 
is normal with mean y times the variance if 


co = 31+ (1 + 4a/a)'], 9c = BL & (1 — 4a/d2)4], 


° 


where 0 < a S 2/4. Thus this family does not satisfy the conclusion of the 
theorem although x; ¥ 0, ko ¥ 0,¢, > 0, c2 > 0. 

This example leads to the conclusion that, among distributions with moments 
of all orders, the condition that some three cumulants of the form x; , Kjs,, 
and xj+42, not be zero, although it has not been shown necessary for the conclusion 
of the theorem, is little stronger than necessary. Specifically, we can prove that 
any additively closed family of non-degenerate distributions with all moments 
existing (and, in case (iii), characteristic function continuous in A) which satis- 
fies the conclusion of the theorem must have at least one x; ~ 0 with j > 2. 
For suppose this were not true. Since all moments exist, so do all cumulants. 
All cumulants beyond «x2 would be zero. Consequently the distributions would 
be normal. By Teicher’s Theorem 1 [8] a one-parameter additively closed family 
of non-degenerate normal distributions with, in case (iii), characteristic func- 
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tions continuous in A(A > 0) must have those characteristic functions of the 
form 


(10) (f(t) a eit 2 7 
where ¢ > O and f(t) is a characteristic function not depending on A. Hence 
(11) oc =a. ors 


where a > 0 and y is real. (In case (iii) we may let a = | without loss of gen- 
erality.) Equations (10) and (11) would thus be true if the statement in question 
were not true. But, as shown in the preceding paragraph (where we may take 
the variance as ad > 0 also), this family does not satisfy the conclusion of the 
theorem, contrary to the hypothesis. Hence the statement in question must 
be true. 

Furthermore, any asymmetrical distribution with characteristic function ex- 
pansible in a convergent Maclaurin series must have some non-zero x; for 7 > 2 
(or non-zero central moment) of odd order; this follows from the formula for the 
cumulative distribution function in terms of the characteristic function. 

The distributions of some additively closed families are members of the Pear- 
son system. By use of Kendall’s recurrence relation for the cumulants of Pear- 
son curves [7] it can be shown that any Pearson-type distribution except the 
normal for which the recurrence relation is valid has at least three non-zero 
cumulants of the form «; , «j4,, and xj,2. A family of Pearson Type III distri- 
butions with left-hand endpoint at zero and the non-additive parameter fixed 
(the family of all x° distributions for example) is thus an additively closed 
family that satisfies the theorem. 

An example showing that the conditions F(x2; A) = 0 for « < Oand F(xz;A) > 0 
for x > O are not implicit in the conclusion of the theorem is the family of Poisson 
distribution functions F(x; A, b, a) where \ > 0, b # 0, a # 0, and F is a step- 
function with a jump equal to Xe “/v! at x Nb + va, v = O, 1, 2, --- [3]. 
None of the cumulants is zero, so that the theorem can be applied without in- 
voking the above conditions. Such translation can be applied more generally to 
additively closed families of distributions; the corresponding slight extension 
of the theorem is omitted. 

It may be questioned whether the existence of any moments is necessary to 
assure the conclusion of the theorem. In cases (ii) and (iii) the general form of 
the characteristic function of an infinitely divisible distribution is available [8} 
and might be thought applicable. No appreciable results have been derived 
therefrom, however. The above example of Cauchy distributions shows that a 
restriction of some sort must be placed on an additively closed family whose 
moments do not exist in order to assure the conclusion. 


4. Further examples. The generalized Poisson distributions associated with 
an arbitrary but fixed distribution [4] form a one-parameter additively closed 
family of distributions. The generalized Poisson distributions include the nega- 
tive binomial distributions [1] and Neyman’s contagious distributions [4]. 
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The example in section 3 of the additively closed family of normal distributions 
not satisfying the conclusion of our theorem can be generalized to certain families 
of stable distributions. Suppose that we have an additively closed family of 
stable distributions with additive parameter \ of any of the three types in the 
theorem, ¢(¢; A) being continuous in \ for each ¢ if is of type (iii). It follows 
from the general form of the characteristic function of a stable distribution [5] 
and from Teicher’s Theorem 1 that 


log (@(t; r)) B(A)it — (dA) | t 2% E + 76(A) ; w(t | 


= AB(1)it — rO(1) | | E + 76(1) “wl »|, 


where a(A), B(A), 5(A), O(A) are real functions of \ satisfying 0 < a(A) = 2, 
6(A)| S 1, and @(A) = O, and where w(t; A) = tan [wa(A)/2] or (2/7) log | ¢ 
according as a(A) ¥ 1 or a(A) = 1. 

By equating real and imaginary parts and simple computations, it is readily 
established that 


6(A) = AA(1), a(A) = a(1), 
B(A) = AB(1), 6(A) = (1). 


With these conditions the corresponding family of stable distributions is indeed 
additively closed. Every stable distribution is in at least one of the infinitely 
many such families of stable distributions. If X,; and X, are independent random 
variables whose distributions are in the above family, it can be shown that there 
exist constants c,; and cz. unequal to 0 or 1 such that c,:X; + coX-e is also in the 
family. Thus one-parameter additively closed families of stable distributions, 
with ¢(¢; \) continuous in J in case (iii) of the theorem, cannot satisfy the con- 
clusion of the theorem. 


5. Acknowledgment. The author is indebted to the referee for many clarifying 
and stimulating comments, and to M. M. Siddiqui for comments on «a revised 
version. 
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NOTE ON RELATIVE EFFICIENCY OF TESTS 
By Cours R. Biytu! 
University of Illinois and Stanford University 


1. Summary. This note is concerned with possible definitions of relative 
efficiency for two sequences of tests of the same hypothesis. For two examples 
of one kind of definition, relative efficiencies of the Student test and sign test 
against normal alternatives are calculated for fixed sample size and asymptoti- 
cally. 


2. Introduction. Consider the following problem of relative efficiency of tests. 
Experiments X; , X., --- and twosequences {A,(X,,--- , Xn)}, (ASCs, «*-, 
X,,)} of level @ tests are available for testing the same hypothesis. We must 
decide whether to use an A test or an A* test. Commonly one sequence, say the 
A*s, gives better power for given sample size, but for some reason such as wider 
validity we may prefer one of the “‘less efficient” A tests. 

The general decision formulation for this problem would use three loss func- 
tions (i) cost of experimentation (ii) loss from wrong decisions (iii) disadvantages 
of using A*. The usual kinds of decision problems for three loss functions could 
then be discussed. In practice (iii) is hard to assess and there is no natural com- 


parability between (i) and (ii). So what is usually done is to consider (i) and (ii) 
only, and having required a bound on one of them, to decide whether the de- 
crease in the other is enough to compensate for the disadvantages of using A* 
instead of 4. More specifically, the following two types of problems are of interest. 

(a). Fixed power requirement problems. For a given power requirement, shall 


we use A, or Av? Here n and n* are the smallest sample sizes for which the 
respective kinds of tests satisfy the given power requirement. Some function 
K(n, n*) such as C(n) — C(n*) or 1 — C(n*)/C(n) is chosen as measuring our 
loss (extra experimentation cost) from using A, instead of A%.. If K(n, n*) is 
small enough we will prefer to use A, because of the advantages (iii) of A tests. 
If the given power requirement is a function of an unknown parameter 6, the 
loss K(n, n*) will also be a function of @ and so cannot be used directly for de- 
ciding between A, and A%-. Some measure of loss not dependent on @ is needed. 
One natural choice is the worst possible loss sups A(n, n*). (Weighted averages 
over @ and limits over particular sequences of @’s have also been used.) Asymp- 
totic behavior of K(n, n*) and supe K(n, n*) can be investigated for sequences of 
power requirements forcing n — «% and n* — «x. The particular choice 
K(n, n*) = 1 — n*/n (with n*/n being called the efficiency of A relative to A*) 
and its asymptotic properties has been of wide interest [1], [2], [3], [4]. 
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(b). Fixed sample size problems. For a given sample size n shall we use A, 
or A*? Let B, be the power of A, and 8% the power of A*. Some function L(8, , 8%) 
such as 6% — 8, or 1 — £,/8*% is chosen as measuring our loss (extra wrong de- 
cisions) from using A, instead of A*. If L(B, , 8%) is small enough we will prefer 
to use A, because of the advantages (iii) of A tests. If the powers 8, and Be are 
functions of an unknown parameter 6, the loss L(8, , 8%) will also be a function of 
6 and so cannot be used directly for deciding between A, and A%. Some measure 
of loss not dependent on @ is needed. One natural choice is the worst possible 
loss supe L(8, , 8%). This choice appears, with L = 6* — 8, in the definition of 
stringency. Asymptotic behavior of L(8, , 8%) and sups L(8, , 82) asn > @ can 
be investigated. 

Though interest has been mostly in type (a) problems, it would seem that type 
(b) problems should be about equal in interest and applicability. The purpose 
of the present note is to discuss, as an illustration of type (b) problems, the follow- 
ing simple example. 


3. Sign Test vs. Student Test. Let X,, X.,--- be independent, each with 
Normal (6, o°) distribution. We are to test at level a the one-sided hypothesis 
{6 S 0} against the alternative {6 > 0}. Let 6 = @/c and p = p(é) = P(X, > 0) 
= F(6) where F is the Normal (0, 1) cumulative. Then the number R, of positive 
observations among X,,---, X, has a Binomial (n, p) distribution. And 


T, = wX (docx, ~- 27 ia — 1)}! 


has a Student ¢ distribution with n — 1 degrees of freedom which is central 
when 6 = 0 and non-central with parameter n’é in general. 
The sign test A, of {@ S 0} ts 


teject when R, — n/2 > k, 
Reject with prob. y, when R, — n/2 = 
where k, , yn are constants determined by 
P(R, — n/2 > ky | 6 = 0) + yaP(R, — n/2 =k, 
The power function of this test is 


B.(6) = P(R, — n/2 > kp) + yaP(R, — n/2 


Values of k, , yn, 8n(6) can be obtained from tables such as [5] of the binomial 
distribution. For large values of n the normal approximation to binomial gives 


; (Vn (2p — 1) - ; 
(1) B,(6) = F (* Var = ‘) where F(c) = 1 — a. 
2V p(l — p 


The Student test A* of {@ S 0} is 


Reject when T, > cp 
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Fig. 1. Power functions 8* of Student test and 8 of sign test fora = .05,n = 11; Ll, = 


B* — B, L. = 1 — B/s*. 


where c, is a constant determined by 





P(T, > cn|6 = 0) = a. 
The power function of this test is 
Bx(6) = P(T, > ¢n). 
Values of c, can be obtained from tables of the Student ¢ distribution, and values 


of 8%(6) from tables such as [6] of the non-central Student ¢ distribution. For 
large values of n the normal approximation to non-central Student ¢ gives 


(2) B3(8) = F(\/nd — c) where F(c) = 1 — a. 
Loss functions such as 
Li(8) = Li(Bn, Bn) = Bx(6) — Bald) 
L2 (8) = Le(Bn, Bn) = 1 — Bal(5)/B2(8) 


can easily be plotted for particular values of n and a. This is done in Figure | 
forn = 11, a = .05. As 6 increases from 0 each function L;(6), 7 = 1, 2 increases 
from 0 to a maximum and then decreases toward 0. 

For fixed a the change in appearance of these curves with increasing » differs 
only slightly from a simple horizontal compression. The curve L;'(6) rises more 
quickly to its maximum and then falls more quickly toward 0, with increasing 
n. The position of the maximum tends to 0 at the rate 1/n’ but the maximum 
value changes very little and has a limit. Table 1 gives values of sups L7(6) 
fora = 05 and n = 2, 3, --- , 13. These values are computed from tables [5], 
[6] using interpolation and should be in error by not more than one or two units 
in the third decimal place. The cases n = 2, 3, 4 are special because for these 
the sign test does not reject with probability 1 even when FR, = 0 and so the 
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TABLE 1 


Maxima of Ly power loss, L» = 1 — power ratio, of sign test relative to Student 
test, a = .05 


sup Li sup L» 


. 800 .800 
.600 . 600 
. 200 . 200 
130 .197 
189 . 263 
.150 .212 
.153 238 
.180 .261 
142 .213 
.167 .202 
171 . 260 
151 227 


1686 . 2610 


power of the sign test does not —1 as o > «. Forn = 5, 6, --+ sup; L7(6) tends 
to be smaller if there is a non-randomized sign test with size close to.05 [n = 5, 8, 
10, 13] and larger if there is no such sign test [n = 6, 9, 12]. Even for the smallest 
of these n the differences from the asymptotic values lim,... sups L7(6) are not 
large. 

Discussion of this example is concluded with the calculation of these asymp- 
totic values. The following easily proved result is used: 

LEMMA. 


lim sup L,(6) = sup lim L,(6,) 
4 


noo 8 no 


if the former exists, where A is the set of all sequences {5,} for which limy.« Ln(dn) 
exists. [If lim be replaced throughout by lim inf or lim sup the same result holds, 
with existence provisos unnecessary.] 

Writing 5, = a,/n’ it easily follows from (1) and (2) that if a, > a then 


Bn(5n) > F(av/2/m — cc), Bald) > F(a — c) 
where F(c) = 1 — a. This gives 


(3) lim L?(5,) = F(a — c) — F(avV/2/x — c) 


no 


(4) lim L?(6,) = 1 — F(avV/2/x — c)/F(a — ©). 

Because of the lemma we can find lim,_... sup; L7(6),n = 1,2 by finding the value 
of a giving a maximum in (3), (4). Differentiating (3) with respect to a and 
equating the result to zero gives 
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TABLE 2 
Asymptotic maximum power loss R, and proportionate power loss Ry for sign 
test relative to Student test 


« a’ a” Ri R2 





25) 1.5514 1.1784 0963 . 1268 
-10 1.6245 1.4086 . 1405 . 2056 
.05 2.3570 1 . 1686 . 2610 
Ol 3.0019 1 . 2229 3765 
.001 3.7676 wad 2844 .5128 
0 x” x“ 1 1 

(3’) —exp |—}(a —c)} = 2 L exp |—3(avV/2/r — c) } 

V2 . V «V2 P 


which reduces to 
(a — c)’ = (aV/2/x — c)’ + log (x/2). 


The root of this quadratic at which the maximum of (3) occurs is 





= ——__ 1 + VIF (log e201 + Va — V2/nIe. 
1+vV2/xr 
The maximum value R; = lim,.. sup; Lj (¢) can now be found by substituting 
a’ for ain (3). For example a = .05 givesc = 1.6449, a’ = 2.3750, and R; = .1686 
for the asymptotic maximum loss. Differentiating (4) with respect to a and 
equating the result to zero gives 
F(a — c) V 2 | exp {—}(avV/2/x — c)*} 
TV2r 

(4’) 

= F(avV/2/x —c) 1 exp {—4(a — c)°}. 

V 20 

For given @ the solution a” of (4’) can be found numerically and shown to 
maximize (4). The maximum value R, = lim,.. sup; L2(6) can now be found 
by substituting a” fora in (4). For example a = .05 gives c = 1.6449, a” = 1.5593, 
and R, = .2610 for the asymptotic maximum loss. 

Table 2 gives a’, a”, R,; (the asymptotic maximum power loss), R, (the asymp- 
totic maximum amount by which the power ratio falls below 1) for several values 
of a. The most noticeable feature of this table is the strong dependence of R, 
and FR, on the value of a. For small @ use of sign test instead of Student test 
results in very severe loss of power at some alternatives. For example when 
a = .001 there is an aiternative where 51%, of the power is lost by using sign 
test instead of Student test, and an alternative where the amount of power lost 
is .28. 
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NOTE ON CONFIDENCE INTERVALS IN REGRESSION PROBLEMS 


By JoHn MANDEL 
National Bureau of Standards 


This note deals with the construction of confidence intervals for arbitrary real 
functions of multiple regression coefficients. 
Consider the usual model 


(1) _— _ Bidia + €a 


in which the e, are independently and normally distributed with mean zero, and 
common variance o. 


It is customary to construct confidence intervals for the 6; , using Student’s 
t distribution. Alternatively, a joint confidence region can be constructed for the 
83; using critical values of the F distribution. In both cases the usual statistic 
s’, based on N — k degrees of freedom, is used as an estimate of o’. 

Durand |1] has discussed the use of the joint confidence region of the 8; , an 
ellipsoid in a k-dimensional space, for the construction of confidence intervals 
for linear functions, Q = > hiB; of the regression coefficients. He points out 
that the chosen confidence coefficient (corresponding to the ellipsoid) is a lower 
bound for the joint confidence of any set of intervals thus derived. 

Our first objective is to generalize this procedure by removing the restriction 
of linearity. Let 
(2) 2 = f(B,, Be, °+*, Br) 
be any real function of the coefficients 8; . The form of the function is arbitrary 
but known. 

For any arbitrarily selected value of z, say 2), equation (2) represents a 
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hypersurface in the k-dimensional parameter space of the 8;. Denote by M[z] 
the set of all values of z for which the corresponding hypersurfaces ‘‘cut” the 
ellipsoid, i.e., for which the equation: 


zo = f(h, 2,°°* » Be) 


and the quadratic equation representing the ellipsoid have at least one common 
real solution in the §;. 

The set M{z] is, in general, a closed interval, bounded by those two values of z 
for which the corresponding hypersurfaces are tangent to the ellipsoid. Further- 
more, the event that the point corrresponding to the “true” values of the {; 
is inside the ellipsoid implies that the z-value corresponding to these true values 
is an element of M{z], but the converse is not necessarily true. Consequently, 
since the probability of the former event is equal to the confidence coefficient 
1 — a, the probability of the latter event is at least 1 — a. If other functions 
u = (6, Bo,°::, Be), v = WB, Be,---, Bx), ete., are considered simul- 
taneously with z, it follows that the confidence intervals constructed by the 
above procedure for z, u, v, «++ are all jointly valid with a joint confidence for 
which 1 — a@ is a lower bound. 

Our next objective is to discuss, in the light of the above procedure, a regres- 
sion problem often encountered in practice. 

Consider the straight line regression 


(3) Ya = Bot Bilta — £) + €c asi... 9 


where @ = (1/N) > 4.24. Having obtained least squares estimates for Bo) and 
B; , say bo and b, , consider p ‘‘future’’ observations of y and let it be required to 
find confidence intervals for the corresponding p values of x. 

This problem involves, in addition to the random errors of the original NV 
values of y, as reflected in the random fluctuations of the least squares estimates 
bo and b; , also the random errors of the p “future” y values. Denote the “future” 
observations by ywvsi, Yvi2,°°*, Ynep, and their expected values by nw41, 
nn42,°°* » Nv+p- Consider the p + 2 dimensional space with coordinates Bb , 
Bi, Mv41, N42, °°* » Nv4p~ The joint confidence ellipsoid for these p + 2 values, 
for any given confidence coefficient, will be centered on bo , bi , Yi, Yn42, Uns s 
and can be found as follows by a generalization of a method used by Working 
and Hotelling [8]: 

The quantity 


(4) = (Go = bo)® 5 Gr = by), Lebalnwss — yrs)! 


> : 





Tbe Fb, e 
' ee : ‘ ; 2 3 
has the chi-square distribution with p + 2 degrees of freedom. o}, and o, are of 
e ° 2 2 2 r 2 2 N =\2 
course known functions of o”, 05, = o°/N and 03, = 0° / D3 (1 — 2)’. 
On the other hand, we have 


| 


(5) x: 3 


o 


a quantity distributed as chi-square with (N — 2) degrees of freedom. 
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Since xj and x; are mutually independent, it follows from (4) and (5) that a 


joint confidence region, with coefficient 1 — a@ for Bo, 6; , and the expected 
ralues of yv41, +++ , Yw4p is given by 
, (Bo — by)” (8, — by)? B ‘  * 
(6) {Bo = bo)’ > eames + D0 (aves — yrsid” = (p + 2)F as 
1/N 1/>-(a —Z) i=l 


where F, is the critical value of the F distribution with p + 2 and N — 2 de- 
grees of freedom, at the a level of significance. 
Consider now the function 


ef a e+ tah 
Bi 
where 7’ is the expected value of one of the p “future’’ observations, and 2’ 
the corresponding true x-value. By the method previously outlined, confidence 
limits for x’ are obtained by determining the two values of x’ for which the 
hyperplane 


(7) n’ — Bo = B(x’ — 2) 
is tangent to the ellipsoid, provided that the set of values of z’ for which the 
hyperplane (7) intersects the ellipsoid is a closed interval. 


‘ i ’ , ar — ’ ™ 
Denoting these limits by x, and zy , it is found that the quantities u, = zr, — # 
, a ° 
and uy = xy — & are the roots of the equation 


(8) G = =. — .) “uo — 2hily’ — bo)u + G _ bo)” - Cae x] = 0 
d(x — ~) N 


where K* = (p+ 2)F 48°. 
The condition for equation (8) to have distinct real roots is 


i a ; , r2 
(9) (y oy a + aC A K |> 0 


> (a — £)° (x — #) 


Condition (9) is necessary but not sufficient for obtaining a confidence interval 
for x’. This is apparent from the fact that when 2’ is made +, equation (7) 
represents the hyperplane 6; = 0. Consequently, if the hyperplane 6, = 0 
intersects the ellipsoid, the parameter z’ will have a discontinuity when (7) 
becomes 8; = 0, and the roots x, and zr, though distinct and real, will then 
not be the limits of a confidence interval for 2’. 

The condition for 8; = 0 not to intersect the ellipsoid is 


(10) bi >(z — #) > K’ 


It can be proved that condition (10), which implies (9), is both necessary and 
sufficient in order that the roots of (8) yield the limits of a confidence interval 
lor x . 


equation is satisfied, the procedure leading to equation (8) can also 
If equat (10) tisfied, tl lure leading to equation (8) can al 
be carried out for the remaining p — 1 ‘future’? measurements, y”, y’’’, etc. In 


. . ° e 2 ° ’ , ” 
this manner one will obtain a set of confidence intervals (vz , tv), (XL, 2p 


” 


i 
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ie. ty), ete., all of which are jointly valid with a confidence coefficient for 
which 1—a isa lower bound. Furthermore, this lower bound will still apply if con- 
fidence intervals are also derived for any number of real functions of 8» , 8; and 
the p values nv41, mv42, °°" 5 Nv+p- 

Equation (8) should be compared to the relation obtained by the use of 
Fieller’s theorem [3, 4]. This theorem leads to a confidence interval for 2’ — Z 
by considering it as the ratio of the two normally distributed variables y’ — bo 
and b; , whose variances are (V + 1)o°/N and o/> (x — #)° and whose co- 
variance is zero. The confidence interval, with coefficient 1 — a, thus found is 
given by the roots of the equation 


2 Cr ° ’ 2 N lo 2 


where ¢, is the critical value of Student’s /, at the two-sided a@ level, and wu is 
defined as above. 

The only difference between equations (8) and (11) is the substitution of K° 
for (’s’, i.e., the substitution of [(p + 2)F.]' for t. . This substitution results in 
a widening of the confidence interval, caused by the joint consider: ation of p + 2 
parameters instead of the single parameter 7’, (or its corresponding x’). It is of 
interest to observe that the relation between [(p + 2)F.]' and ¢, is acpi 
that found by Scheffé [6] in establishing simultaneous confidence statements for 
all means in an analysis of variance, as contrasted with individual confidence 
statements based on Student’s ¢. 

In deciding whether in a particular application, joint or single confidence 
intervals should be used, one may be guided by the following plausible rule. 
Joint confidence intervals are indicated in situations involving two or more 
quantities that are determined as so many phases of a single problem. On the 
other hand, quantities involved in unrelated problems, even though they are 
derived from the same basic data, should not be treated jointly in deriving confi- 
dence intervals. It appears advisable, in view of this rule, to partition all the 
quantities derived from a single set of data into groups such that the quantities 
within a group—inasmuch as they correspond to the same problem, are treated 
jointly for the derivation of confidence intervals; while the groups themselves 
are treated independently of each other. 

Groups involving single predictions should be treated by Fieller’s theorem, 
since there appears to be no justification, in such cases, for widening the confi- 
dence interval through inclusion of confidence statements about the slope and 
the intercept. 

It is of interest to note that the confidence interval based on equation (8) may 
be obtained by drawing hyperbolic confidence limits [2] for the straight line rep- 
resented by equation (3), in accordance with the relations 


sila 1 (x — 
(12) = by a bi (a: —_ Z) + K E a + + $G | 


and by determining the z-interval defined by the intersection of the line 1 
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with the two branches of this hyperbola. It is readily seen that the condition 
that such an z-interval exists and be of finite length is equivalent to the condi- 
tion that the two asymptotes of the hyperbola have slopes of equal sign. Since 
these slopes are b} — K/[Do(2 — @)*}' and b, + K/DC (a2 — z)’|', the condition 
in question is bi — K’/(>_(a — #)*) > 0. This is condition (10) obtained previ- 
ously by a different line of reasoning. 

It may be observed, finally, that the inverse problem, viz, to determine un- 
certainty intervals for observed y values corresponding to given xz values [2] is 
not a classical case of interval estimation, since it is concerned with bracketing 
a random variable, not a population parameter, by means of two statistics. 
Intervals of this type are discussed by Weiss [7]. 

Applications of the procedure outlined in this note to a problem in chemistry 
are discussed elsewhere [5]. 
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A NOTE ON INCOMPLETE BLOCK DESIGNS 


By A. M. KsuirsaGar 
University of Bombay 


1. Introduction. Kempthorne [1] has shown the efficiency factor of an incomplete 
block design to be a quantity proportional to the harmonic mean of the non- 
zero latent roots of the matrix of coefficients of the reduced anormal equations 
for the intra-block estimates of treatment effects. He has further stated that the 
geometric mean in a certain sense corresponds to the generalized variance but 
has not explicitly explained it. The present note is intended to clear this point 
and to prove that the design with highest efficiency factor (in any case, whether 
the harmonic mean or the geometric mean is taken as a measure of efficiency) is 

(a) a balanced incomplete block design, if such a design exists; and 
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(b) a Youden Square, if it exists, among designs in which heterogeneity is 
eliminated in two directions. 

There is some overlap between this paper and the ones by Kiefer and by 
Mote in this issue. 


2. Incomplete block design. Let there be v treatments and b blocks of k plots 
each. Let r be the number of replications of each treatment and let NV be the in- 
cidence matrix of the design (rows refer to the treatments and columns to blocks). 
Each element of N, for an incomplete block design, is either 0 or 1, Then the 
matrix of coefficients of the reduced normal equations for the intra-block esti- 
mates ¢t; of the treatment effects is 


C= 


where /, denotes the identity matrix of order v. For any design, C has one zero 
latent root, the corresponding latent vector having all the elements equal. Let 
the non-zero roots of C be A, Av, +++ , Apa and let m;,, m2, --- , my, be the 
corresponding orthogonal normalized latent vectors (column). Kempthorne 
[1] chooses the average variance of elementary treatment contrasts like /; — ¢; 
to arrive at the harmonic mean of the \’s as a definition of efficiency factor of 
a design. The author, however, feels that, instead, a complete set of » — | 
orthogonal normalized treatment contrasts be chosen because, 

(1) their average variance leads to the harmonic mean of the \’s; snd 

(2) their generalized variance leads to the geometric mean of the \’s, as a 
criterion to measure the efficiency of a design. Let 1;(¢ = 1, 2,--+ ,: 1) be 
orthogonal normalized column vectors so that //t where 


form a complete set of orthogonal normalized treatment contrasts. Then, if we 
observe that 


r—l u 
> uty? = dt -— d’, 


t=1 


and use Kempthorne’s [1] result about average variance of t; — ¢;, it follows 

readily that the average variance of a full set of orthogonal normalized treat- 

ment contrasts is proportional to the hormonic mean of \;, Av, --* , Ae - 
However, if we consider the generalized variance of UitGg = 1,2,---,»- h), 


it can be shown that it is proportional to (AjAo---Ap1). This can be 
° e ° / e 
proved by using the fact that the transformation from 1;t (¢ = 1, 2, - v 1) 


,—~) 
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to mt (i = 1,2, --> , v — 1) is orthogonal and that 


> 


Vim; t) = . 
Aj 


Cov (mit, mt) = 0, 1 J, 


where o° is the variance of the yield of a plot. 
Thus, either > "1/d or (AjAo-++Ay-3) can be taken as a measure of 
efficiency of the design. It should be noted that 


l 
trace C = ur{ 1 — =}. 
race r( :) 


Hence to obtain a design with highest efficiency we have to minimise either 
> ial 1/d,; or (AyA2 +++ Ae) subject to the condition that 


>» Ay = constant. 


t= 
This immediately leads to 
M=M= ee 


and consequently, 


vr l De 
C= 7 = = = eb 
2 (1 (1. + ) 


where E,, denotes a p X q matrix, all the elements of which are unity. This 
proves, therefore, that the design with the highest efficiency is a balanced in- 
complete block design, if such a design exists. 


3. Designs in which heterogeneity is eliminated in two directions. lect there 
be UU’ plots arranged in U rows and U’ columns, and let v treatments be as- 
signed to these plots in such a way that every treatment is replicated r times 
and the ith treatment occurs 1;; times in the jth row and my times in the kth 
column (¢ = 1, 2,---,v;7 = 1,2,---, U;k = 1, 2,---, U’) where l;; and 
m,, are either 0 or 1. Let L = [l;;] and M = [my]. Then the matrix of coefficients 
of reduced normal equations for treatments effects after eliminating row and 
column effects is 

I 


Co = rl, — 3, LL’ — 1. MM’ + 


r , 
UU’ E,,. 
This matrix Cy plays the same role as C in section 2. Hence for a design of this 
type, the efficiency is maximum if all the non-zero latent roots of Co are equal, 
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the common value being 


{ trace Sm omy (1 ; a +. i) 





v— nn | “OU Uv l 
= 4G, say. 


It therefore follows that for designs in which heterogeneity is eliminated in 
two directions, the efficiency factor is maximum if 


W LL’ + A MM’ is of the form 
Pqq-*'q 
qpq-''4q 
qq4q°"°P 


It should be observed that, for a Youden Square (where the rows are com- 
plete blocks and columns form a symmetrical balanced incomplete block de- 


sign), 

U =r, U’ = 4 

and 

L = ae 

and 
rrAA° A 
MM’ = AryA> A 
AAA 7 


and LL’/U’ + MM'’/U is of the required form. Consequently, among designs 
in which heterogeneity is eliminated in two directions, a Youden Square, if it 
exists, has maximum efficiency. 

Acknowledgement: I am indebted to Prof. M. C. Chakrabarti and the referee 
for their valuable help and suggestions in the preparation of this note. 
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ON A MINIMAX PROPERTY OF A BALANCED INCOMPLETE 
BLOCK DESIGN 


By V. L. More 
Institute of Statistics, Raleigh, North Carolina 


Summary. It is shown that for a given set of parameters (b blocks, k plots per 
block and v treatments), among the class of connected incomplete block designs, 
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a balanced incomplete block design (if it exists) is the design which maximizes 
the minimum efficiency, efficiency being defined as 


Variance of an estimated treatment contrast in a randomized block 


Variance of the estimated treatment contrast in the incomplete block — 


The proof will be preceded by a lemma. 

Notation. Capital letters will be used to denote matrices and boldface small 
letters to denote vectors. At times a matrix of m rows and n columns will be 
denoted by A(m X n). 

LemMa. If B(p X p) is real symmetric and at least positive semidefinite of rank 
r( Sp), then: 


(i) The stationary values of 


a'(1 X p)B(p X p)al(p X 1) 
a’a 
under the variation of a (over all non-null a excepting the solutions of Ba = 0) are 
the characteristic roots of B. 

(ii) In particular the largest and the smallest values of a'Ba/a'a (under the 
variation of all non-null a excepting the solutions of Ba 0), are the largest and 
the smallest non-zero characteristic roots of B. 

(iii) a’Ba/a’a altains its maximum (or minimum) value if and only if a is a 
latent vector corresponding to the maximum (or minimum) latent roots of B. 

For a proof of this lemma we refer to S. N. Roy [3] and H. W. Turnbull and 
A. C. Aitken [4]. 

Let us adopt the following notation: 


Xia = number of blocks in which the ith and the ath treatments appear to- 
gether. 
r; number of blocks in which the 7th treatment appears. 


( = Ala 
L 
I: 


ry ( = '), j= a. 


total yield of the 7th treatment. 

total yield of the jth block. 

‘1 if the ith treatment appears in the jth block, 
\0 otherwise. 


0; = T; =— ni; B;. 


I 
Kj 


Finally let 
Q’(1 K v) = (Q1 Qe +++ Q,). 


In any connected incomplete block design the adjusted normal equations 





912 Vv. L. MOTE 


are given by 
Ct=Q 
where 


C = (c.) ¢@1,2.++:.y 


>“) > a > > U. 


It is well known that C is symmetric positive semidefinite of rank » — 1 and 
that the only independent non-trivial solution of the equations Cx = 0 is 


x’(1 X v) = (1, 1,---, 1). 


Let m/(1 X v) = (mm, --- , m,) be a non-null vector such that a Mm; 0. 
It is well known (e.g., see R. C. Bose and 8. Ehrenfeld) that the variance of 

the “best estimate’’ of m’t is given by o’Coo’ where 9 is a solution of Cé = m. 
We shall now show that 


Amin 
where M is the class of all non-null vectors m’(1 & v) = (m,, m, +--+ , m,) such 
that >>; m; = 0 and Amin is the smallest of the v — 1 non-zero characteristic 
roots of C. 


Since C is real symmetric, it follows that there exists an orthogonal matrix 
P(v X v) such that 


Dy, {(w —1)X @-—D) O[@-—1) xX IJ 
P'CP = 


Of xX @ — 1) 0 
where Dy, is a diagonal matrix; the diagonal elements being A; , Ax, --- , A 
the non-zero latent roots of C. Let 
P = [P,fv xX (v — 1)] q(v X 1)). 
Then C = P, DP: ; 
It can be easily shown that 


(2) P\P\ + aq’ = I, 
(3) P\P,; = 1 


and that the rank of P; isv — 1 and 


n be seen that 
(P,Dx) Py}m 
is a solution of C@ = m, and 
; o'Co £ m’P, Dx? P}m 


m’m m’m 
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Hence by virtue of the lemma stated earlier we have 


m'(P,Dy/Pi)m _ 1 


Sup 
meM m’m Amin 


The variance of the “best estimate’’ of m’t in a randomized block is 
(1/b)m‘mo". 
Hence, 


efficiency 


where o is a solution of C6 = m. Now 


m’m | l 
inf Cp = — 
meM LO eo . 

Sup et 
-meM m 


, 


Hence, minimum efficiency = Amin/b. It can be shown that for any connected 
design Amin S Av/k, where 


~~ bk(k — 1) 
v(v — 1) 


Now if we can show that, Amin = v/k if and only if the design is a balanced 
incomplete block design, then our problem is solved. If the design is a balanced 
incomplete block design, then, Amin = Av/k, since Av/k is a latent root of multi- 
plicity v — 1 for the C corresponding to the given design. The next thing we 
have to show is that if Amin = Av/k, then the design is a balanced incomplete 
block design. Since Amin = Av/k, it follows that all of the remaining v — 2 roots 
must be exactly Av/k. Hence 


, Nv , 
C= Pi DP: os Ie P,P,. 


By virtue of equations (2) and (4) we have 


a. 2" ee 


v 


where J is a matrix of dimensions v X v in which every element is unity. Hence 


dv l 
Calif —-J}. 
; [ | | 


Thus Aj. = A for all 7 # @ hence, the result. 
Acknowledgements: I would like to thank Dr. E. J. Williams and Dr. R. C. 
Bose for their suggestions and criticisms. 
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A CHARACTERIZATION OF THE NORMAL DISTRIBUTION’ 


By J. N. K. Rao 
Forest Research Institute, Dehra Dun, India 


1. Introduction. Using characteristic functions Lukacs [3] has shown that a 
necessary and sufficient condition for the independence of the sample mean and 
variance is that the parent population be normal. Geisser [2] has derived a simi- 
lar theorem concerning the sample mean and the first order mean square suc- 
cessive difference. In section 2 of this note a general theorem of which Lukacs’ 
and Geisser’s results are particular cases has been proved. 

Lukacs [3] has extended his theorem to the multivariate case, namely, that a 
necessary and sufficient condition that the sample mean vector is distributed 
independently of the variance-covariance matrix is that the parent population 
be multivariate normal. In section 3, the general theorem of section 2 is ex- 
tended to the multivariate population of which Lukacs’ theorem for the multi- 
variate population is a particular case. To prove the necessity of this theorem, 
we extend, to the multivariate case, Daly’s [1] result that if f(x) is the normal 
density, then the sample mean and g(z; --- z,) are independently distributed 
where g(x; --- 22) = g(a, + a, +++, 2%, +). 


2. Univariate case. Let x; , --- , z, be independent and identically distributed 
with density function f(z) and mean yu and variance o’. 
Let, 
(2.1) #=n')>oz;--- 
j=l 
and 
re n 1 ™ 
(2.2) & = (= t,) (lati + -+- + Linde), m2 il 
tel j=l t=1 
where 


2. he =(Q for t=1,---,m. 


j=l 


The following theorem is proved. 
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1 Supported by a Senior Research Training Scholarship from the Government of India. 





NORMAL DISTRIBUTION 915 


THEOREM |. A necessary and sufficient condition that f(x) be the normal density 
is that = and & are independent. 
Proor. Following Lukacs [3] we derive the sufficiency. Now, 


(= > &) 4x ye E(x3) + > > Leg lage E (ae; xy »} 


tl jul tml jxtj 


E(&) 


2 
= ¢ 


The joint characteristic function of # and & is 


ot.) = ff f eel" fla)---flan) dev --de, 


Therefore 
~ 0 0 
(2.3) ab, O(ti, te) \egeo = Or(h) ate $2(te) to=0» 
where 

oi(ti) = [¥(t/n)]" 
and 


¥(t,) = [ 4) dz, 
< (tr, ts) |e = 8 (DDE) (SEB) War tf fe "42) dx 
2 (~ Ly Las Les )Iy(t,/n)]" . il xe'* "f(z) az] 


. See" 


a¢  ty(ta/n))"— / re'* "f(x) dx 


e L ve'*'"f(z) az] 


0 oi 
a felt. 4 = Fe 
ate $2 (te) |to—@ to 


and 


Hence, Eq. (2.3) reduces to 


(2.5) —y(t) a 2 + [#0] = o ly(i))’, 


the solution of which is the characteristic function of the normal distribution. 
The necessary condition follows from Daly [1] who has proved that Z and 
g(x, ---+ Z,) are independent in the normal case, if 





g(t) +++ Zn) = g(t. + 4, +--+, tn + @). 


oe Se -s . . 
Since 5 is invariant under a translation, the theorem is proved. 
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In fact, the above result can easily be extended* to cover a more general 
class of quadratic forms, namely those which are invariant under a translation 
and have non-zero expected values. For, Lukacs’ method can be applied even 
when 3° is defined as follows: 


(2.6) e = (= - aw) b er a2. | , 


t=1 jal tal tJal 
where >> au; = O(f = 1,---,m,t = 1, --- , n), provided 
> jar Geis ¥ O(¢ = 1, --- , m). 
It will be noted that 4° defined in (2.2) above is a special case of & defined 
in (2.6) by putting ag; = Lel:; 


Particular Cases. 
(a) To obtain Lukacs’ result, put 


1 
a? 3.2 for t =j 


=s for t #j 
n 


and m = n. 
(b) To get Geisser’s result, put 
lL; = 1 when 37 
—1 when 
0 for other values of 7 
and m = n — k. 


(c) An interesting extension of Geisser’s result is: a necessary and sufficient 
condition for the independence of the sample mean and any order mean square 
successive difference is that the parent population be normal. 

The rth order mean square successive difference is given by, 


2 2 2\ —1 n—r 
2 —l r r r { r 2 
os aiah = an ( 
6, = (n — r) {(5) + +(,7,)+(7)} 2d A’z:)’, 
where 
A’z, = (;) Ler — (;) Lerr-1 = eee oo Cc 1)’ (") Zi. 


To get the above result, put 
- _1\#r-5 r j ee 
Lis (—1) ee when tSjst+r 
0 when 1SjSt-—1 and t+r+i1sj 
and m=n—r. 


? IT am indebted to the referee for pointing this out. 
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3. Multivariate case. The same reasoning applies also to the multivariate 
case. Denote by zai (a = 1,---,;i = 1,---,p) the a observation on the ith 
variate, by Z;, the sample mean of the ith variate, 


(3.1) w= ((E2 ps tt) | LAE tal ‘2. tater} 


t=] a=l t=1 


or more generally, 


32) 8 = (= > aise) | ote Qtaa’ tai 2~:} (i,j =1,---,p), 


t=1 a=l t=1 \a,a’ 


~ 


where >-> ~1Qae = O(( = 1,---,m;a = 1,---,n), provided 
> c=1 Siee  O (6 =1,--- , mM). 


Assuming that the distribution of [4;,;),x, is independent of the joint dis- 
tribution of the p sample means (7, , --- , Z,) one obtains the equation, 


(3.3) — — —— = —jy, 
where \,; is population covariance of the variates z; and 2; , 


¥=VWh,--:,t) = I/ cee fet? f(a, "++ Zp) day -+- dz,. 


w a 
¥ ; = -¢ , Wij = <¥. 
ot; Ot; Ot; 
If (3.3) is true for i, 7 = 1, --- , p, one has a set of partial differential equations 


which leads to the characteristic function to the multivariate normal distribu- 
tion. 

To prove the necessity, we give an extension of Daly’s [1] lemma of which 
it is a particular case. 


THeoreM 2. Let gi(ty,--+, Uar5°** 3 Tips ***, np), | = 1,---, 7, be 
functions of (tu, +++ , Zu)3 +++ , (Lip, *** » Lup) and are such that 
gi(Zu + 4, °** Lm + G15 +++ 5 Lip + Gp, *** , np + Gp) 

=> gi(2u , “*- » Lal > “*-e > Zip, **e » Beads 

The sample means (%, , --- , Z,) are independently distributed of these r functions 
if f(x, --- Zp) has a p-variate normal distribution. 

Proor. The joint characteristic function is 
Olt, --: ,tp sbi, ++ 5 &) 


ayers | + f ep} ty rui/n\ exp {3 sos) 
xp{-} => > rn” ras rah x II (dates «++ dzazl , 


a=1 i,j=l 


where (4) = —1. 
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Make the contragradient transformation 


P p 
Lai = z Cij Yaj i= ze Cij Uj 


j=l j=l 


Then, 


oli, --* ,tpshi,-', é,) 


id See TS a + feed Emma ry Be 91? 


a=l i=l] lel 
exp 4 32 S y2i/ 9%? \ II [dya1 ks dyapl ; 
2 awl i=l a=l 
where p; , --* , pp are latent roots of the variance-covariance matrix and 


, 
gi(yu + 1, °°, Yn + 5° + i a ide Ynp T Gp) 


; 


= ai(Yn See . Bas toes 
Put 
/ 
Yai wu ; : 
Set, SN eZ 
V Pi - 
then 
o(t “tad yt 
p 


1 
exp 4 woe Be hij ti tj> exp 
i 2n i,j=l 


Pp 
exp 4 -32 $24 x Il [dZa1 «++ dZazl, 


“= a=l i a=! 


where 


” ’ j— a i lr . —_ 
= gi(Zuv ——". » ZnrV pr ——- 5 Ziv ee, ae Zas¥ Pp 


and hence is a function of (Zy ,--- , Zu); ++: 3 (Zip, +++, Zap) only. There- 
fore, 


o(t, +++ tp; 


, 5 


Pp 


= exp< — os z hij ti tj> X (a function of &, --- , & only). 
j=l ) 


alt i,j= 


Hence the theorem. 
Particular case. The sample mean vector (Z, --- Z,) is independently dis- 


tributed of products moments of any order if f(z; --- z,) has a p-variate normal 
density. 


4. Acknowledgement. My thanks are due to Dr. K. R. Nair for his valuable 
suggestions and to the referee for his helpful comments. 
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A NOTE ON P.B.I.B. DESIGN MATRICES 


By W. A. THompson, Jr. 
University of Delaware 


Summary. The notation P.B.I.B. (m) will mean partially balanced incom- 
plete block design with m associative classes. 

It is found that the C matrix of a P.B.I1.B (m) may be expressed as a linear 
function of m + 1 commutative and linearly independent matrices. The author 
feels that this decomposition may be of interest to those studying the properties 
of P.B.I.B. designs. 


1. The C matrix of a P.B.I.B. design. The reader should review the definition 
of partially balanced designs, and the relations among the parameters. See, 
for example, Bose and Shimamoto [2], or Bose [1], or Connor and Clatworthy 


2 
[3}. 


The matrix 


) 
| 


o (cx;), 


where 
ci = r(1 — I/k), 
Cii = — Nis ‘k, i x J 


is of special interest in incomplete block design theory. 
In the case of a P.B.1.B. (m), the C matrix may be written in a particular 
form. We may write 


(1.1) kC = r(k — 1)1 — A Bi, 

i=1 
where B, = [b,j] for s = 1, --- , m, where b;{’ = 0 and 6;;’ = 1 or O ac- 
cording as the treatments ¢ and j are or are not sth associates. Note that J, B; , 
B,,---, B,. form a linearly independent set of matrices since a one in the 


(i, 7)th position of any of them implies a zero in the (7, 7)th position of all the 
others. b,;'b;; equals 1 if treatment j and treatment ¢ are both sth associates 
(s) 


of treatment h, but equals 0 otherwise. If 7 ~ ¢ then > vi? bi; is the number 
of treatments which are sth associates of both treatments j and ¢. But if 7 and ¢ 
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are rth associates, then this is the definition of p,, . Note further that if ) 
then > {b{}}? = n, . Thus 


B, B, = [> BSP bS?] = [do bi bi?) 
(1.2) : : 
=n,I+)> pi. Bi. 
Similarly 


B, B. = [bi bi] 
i Dit B;. 


(1.3) 
Consider the equations 


Y= r(1 — 1/kK)I — 1/k Du Bi, 
t=1 


r(1 — 1/k)B; — 1/k >> Bi B;, 


i=) 


r(i — 1/kK)B; — 1/k > me (2 pi, B.) 
e=1 


+) 


— r,/k (n, I+), pis Bs) 
+=] 


cn a t+ | = te x 1/k & dav B; 


_ 2 1/k > A pis B, 


es) 


We may rewrite these equations as 


e = dool - dmB, + "42 _ domE m 
CB, = diol = dB, + +++ + dimB m . 

(1.4) : : : : 
Cia = d mol dm B a eee ood Gina ° 
where 

doo r(l — 1/k), 

doi -_ r/k, 
nN; rj 


l 
wh 


dio - os ee 
dj; = r(l — 1/k) — 1/k Ds pis, 
dj, = — W/k pis, s=1---mj#s 


If ¢ is arbitrary, and J is av X v matrix, then by subtracting eJ from C in (1.4) 





P.B.I.B. DESIGN 


we get the single matrix equation: 


C —el 
C — el 


L 


[ (do — eI dy I 
i | dio I (dy —_ e)l 
| ive Sa 
dng I dent I 
Let D be the (m + 1) XK (m + 1) square matrix: 
r doo du 52? - 
1.7) D =| a ae t 
Ps a ~ aa ews 
We could at this point use the B matrices to verify the following result: 
THEOREM 1. /f e is a characteristic root of C then it is a characteristic root of D, 
and conversely if e is a characteristic root of D then it is a characteristic root of C. 
However, this theorem also follows from Lemma 3.1 of Connor and Clatworthy 
[3]. 
Using the matrices M and A of Lemma 3.1, with z = kx — r(k — 1) we have 
M/k| = |2I -—C 


i? 
and 


z| A/k| = |2I — Di. 


This second relation follows by first adding all other rows of | zJ — D| to the 
first row and then subtracting the first column from all others. Theorem 1 then 
follows from Connor and Clatworthy’s lemma. 


2. The principal idempotent matrices of C. (If the reader is unfamiliar with 
the properties of principal idempotent matrices, then he may consult [4].) Let 
e be a characteristic root of C, and let E(e) be the principal idempotent matrix 
of C corresponding to e. Theorem 1 then states that e is a root of D. By will 
denote the identity matrix. 

THEOREM 2. E(e) = > ros B; , where (co , C1 , --- , Cm) 18 characteristic vector 
of D corresponding to e. 

Proor. E(e) must bea polynomial in C. Therefore, E(e) = > Toe: B; according 
to (1.1), (1.2), and (1.3). At this point in the proof co , ¢; , --- , Cm are arbitrary 
constants. Now, E(e) (C — eI) = 0 since this is a property of principal idempo- 
tent matrices for C real and symmetric. 

We rewrite this relation 


o — el 
61) @ial,-- En C—el 


Lo 
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Using 1.6 and the linear independence of the B’s, 2.1 yields 


(doo — e)I dm I tee dom I 
22. wiet,i-- an} meme wet lee 
dnc I des I el Mw - e)I 
Therefore 
(2.3) (Co, C1, °** ,m) (D — eI) = O. 
If C has m* distinct non-zero characteristic roots, €; , €2,--- , m=, then we 


may write 
C = e,E(e;) + es (ez) + +--+ + €meE (ems). 


Now using Theorem 2 we have 
THEorEM 3. The C matrix of a P.B.I.B. (m) may be expressed as a linear func- 
tion of the m + 1 commutative and linearly independent matrices By , By, --- , Bm. 
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ON A FACTORIZATION THEOREM IN THE THEORY OF ANALYTIC 
CHARACTERISTIC FUNCTIONS ' 


Dedicated to Paul Lévy on the occasion of his seventieth birthday 


By R. G. Lana 
The Catholic University of America 


1. Introduction. Let F(x) be a distribution function, that is, a non-decreasing 
right-continuous function such that F(—«<) = 0 and F(+«) = 1. The char- 
acteristic function 


am 


(1.1) ot) = |e dF(x) 

of the distribution function F(z) is defined for all real ¢. A characteristic function 
is said to be an analytic characteristic function if it coincides with a regular ana- 
lytic function ¢(z) in some neighborhood of the origin in the complex z-plane. 
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Then it follows from a theorem due to Boas [1] that the analytic characteristic 
function ¢(z) is also regular in a horizontal strip —a < Imz < +8 of the com- 
plex z-plane containing the real axis. It is also well known that the analyticity 
of the characteristic function ¢(z) in the horizontal strip | Im z| S R(R > 0) 
is equivalent to the condition that (a) the corresponding distribution function 
F(z) has moments y; of all orders k and further (b) lim sup,..{u./k!]' is finite 
and equal to 1/R. In other words, the analytic characteristic function ¢(z) has 
the power series expansion 


a 


“k 
(1.2) ¢(z) = Foe z 

tao k! 
about the origin z = 0 in the circle |z| S R (z complex) where R > 0 is the 
radius of convergence of the series. The characteristic function ¢(z) is said to be 
an entire characteristic function if its strip of regularity comprises the whole 
complex z-plane. A summary of most of the important properties of analytic 
characteristic functions is given in a recent paper by Lukacs [6]. 

In the present paper we shall discuss some results concerning the decomposi- 
tion properties of analytic characteristic functions. In this direction a very 
interesting theorem has been recently obtained by Linnik [5], [7] which may be 
considered as an analytical extension of Cramér’s theorem [2] on the normal law. 
The theorem is as follows: 

THEOREM OF LinniIK. Let ¢;(t), deft), --- , On(t) denote the characteristic func- 
tions of some non-degenerate distributions and let a, , a2, --~ , an be some positive 
numbers. Let the functions ot) satisfy the equation 


1.3) II {60 }** = exp fipt — 30°t*} 
j=l 

for all real t in a certain neighborhood | t| < &(6 > 0) of the origin, where o° > 0 
and y are real numbers. Then each factor $,(t) is the characteristic function of a 
normal distribution. 

In the following section we shall deal with some related factorization theorems 
(Theorems 2.1 and 2.2) for analytic characteristic functions. These theorems 
may be considered as generalizations of the theorem of Linnik stated above. 


2. The Theorems. We now consider the following theorems: 

THEOREM 2.1. Let di(t), de(t), --- , dn(t) denote the characteristic functions of 
some non-degenerate distributions. Let further o(z) denote an analytic characteristic 
function and a, a2, --- , a, be some positive numbers. Let the functions $;(t) 
satisfy the equation 
(2.1) II {¢,}" = o@ 

j=1 
for all real t in a certain neighborhood \t} < (5 > 0) of the origin. Then each 
of the factors $;(z) is also an analytic characteristic function which is regular at 
least in the strip of regularity of $(z). 

This theorem has already been obtained by Dugué and stated without proof 
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in [3]. The author has also independently obtained a proof of this theorem, fol 
lowing a method closely similar to that used by Linnik in [7]. Proceeding along 
the same lines as the proof of the theorem of Linnik [7], we can show that each 
of the corresponding distribution functions has finite moments of all orders and 
then finally each ¢,(z) is an analytic characteristic function having a power series 
expansion about z = 0 with a positive radius of convergence. Since Linnik’s 
method of proof has been already presented by the author in [4], the proof of 
Theorem 2.1 is omitted. It is understood that the reader may easily construct 
a proof of Theorem 2.1, following the procedure indicated in [4]. We shall next 
prove a related theorem on the entire characteristic function. 

THEOREM 2.2. Under the same conditions as in Theorem 2.1, let o(z) be an entire 
characteristic function of some finite order p. Then each of the factors $j(z) is also 
an entire characteristic function of finite order not exceeding p. 

ProorF. First of all, we give a precise definition of the order of an entire charac- 
teristic function. Let f(z) be an entire characteristic function of some finite order 
p. We denote by 


(2.2) M(r,f) = max | f(z) 


zisar 


the maximum modulus of the function f(z) in the circle |z| <= r (2 complex). 
This value is evidently assumed on the perimeter of this circle. Then using the 
well known property of the positive definite functions 

(2.3) max {f(t + w)! s fliv) (t and v real) 


—wStste 
we can easily deduce from (2.2) that 
(2.4) M(r, f) = max [f(ir), f(—ir)}. 
The order p of an entire characteristic function f(z) is then defined as 


: In In M(r, f 
p = lim sup —————. 


We now turn to the proof of Theorem 2.2. Without any loss of generality in 
the proof, we introduce the symmetrized characteristic functions 


6:1) = o,(0)6(—9), . 
(2.6) j = 1,2, 
A(t) = d(t)o(—2). 


Then it is easy to verify from (2.1) that the characteristic functions 6,(t) satisfy 
the equation 


(2.7) II {6,0} = o@ 


j=l 


for all real ¢ in a certain neighborhood of the origin. We can see easily that un- 
der the conditions of the theorem the symmetric characteristic function 
6(z) = o(z)¢(—z) is also an entire characteristic function of the same order p as 
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¢(z). It then follows at once from Theorem 2.1 that each of the factors 6,(z) in 
Eq. (2.7) is also an entire characteristic function when 6(z) is an entire function. 
Thus we have the equation 


(28 IT {0;(2)}%* = (2) 


holding for all complex Zz. 

We now consider the behavior of each of the functions @;(z) for purely imagi- 
nary values of z. For this purpose, we substitute z = iv (v real) in Eq. (2.8), 
thus obtaining 


(2.9 I] {0;(iv)}*? = (ir). 


Now we note that the distribution function corresponding to each @,(z) is 
symmetric about the origin and hence has all moments of odd order equal to 
zero. Let us denote by ux; the moment of even order 2k of the distribution 
function corresponding to the characteristic function 6,(z), 7 = 1, 2,---, n. 


Then we have 


na 


Moe 2k . 
(2.10 6jw) = ——— fF -& i. 9m 1,2. --- ag. 
7 2: bi 


Using (2.10) in Eq. (2.9), we have, for every j, the inequality 
(2.11) {6;(iv)}™ Ss O(tw). 


We denote by M(r, 6;) and M(r, 6) the maximum moduli of the characteristic 
functions 6,(z) and 6(z) respectively in the circle | z| S r (z complex) as in (2.4). 
Then noting the consequence of symmetrization of 6,(z) and 6(z), we can easily 
verify 
(M(r, 0;) = 0,ér) = 0,(—ir), 3 
(2.12) ‘ a j=1,2,---,n, 
\M(r, 6) = Or) = 6(—ir). 
Then substituting the relations obtained in (2.12) in the inequality (2.11), we 
get for every j 


(2.13) {M(r, 6;)}*4 — M(r, 6), j = l, ces ane 


Then using the definition of the order of an entire characteristic function as 
given in (2.5), it follows easily from (2.13) that each of the factors 6,(z) is an 
entire function of order not exceeding p. This at once establishes that each of 
the factors ¢,(z) is also an entire characteristic function of order not exceeding 
p, thus completing the theorem. 


3. Applications. We now apply the theorems in the preceding section to give a 
simple proof of the theorem of Linnik. 

In this case it is given that ¢(#) = e*“”, where Q(t) is a quadratic polynomial 
in ¢. Thus it is known that ¢(z) is an entire characteristic function of order two 
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and without any zeros. Hence applying Theorems 2.1 and 2.2, it follows at once 
that each of the factors ¢,(z) is also an entire characteristic function of order 
not exceeding two and without any zeros in the complex plane. Then the proof 
follows at once, using the factorization theorem of Hadamard to each of the 
factors ¢;(z). 

In conclusion the author wishes to express his thanks to Professor Eugene 
Lukacs for calling his attention to the paper by Dugué [3]. 
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BOUNDS FOR MILLS’ RATIO FOR THE TYPE III POPULATION 


By A. V. Boyp 


University of the Witwatersrand 


1. Introduction and summary. Cohen [1] and Des Raj [2] have shown that in 
estimating the parameters of truncated type III populations, it is necessary to 
calculate for several values of x the Mills ratio of the ordinate of the standard- 
ized type IiI curve at x to the area under the curve from z to ~. Des Raj [3} 
has also noted that for large values of xz the existing tables of Salvosa [4] are 
inadequate for this purpose and he has found lower and upper bounds for the 
ratio. The object of this note is to improve these bounds, by obtaining mono- 
tonic sequences of lower and upper bounds through the use of continued frac- 
tions. 


2. Approximations to the ratio. Taking the type III population in the stand- 
ardized form 


C f(z) dz, —! 


bd 
IA 
IA 

x 
lA 
IA 
bho 
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where 
ax (4/a*)—! od J 
M7) = (1 ao =) e 2z 
2 
and 
° 2)\—1/9 2 a 
C= (4 a)** N26 © *'11'(4/a)] a 
Des Raj [3] puts 
xz 
G(x) = / f(t) dt and u(x) = f(x) /G(z) 
and obtains 
5 -% ry 
ete lh i conemrtiaieal 
ar +2 ™ (2? + 2ar + 4)? —2 


However, by making the substitution a’v = 2(at + 2) in the integral for G(z), 
we find 

G(x) 
where 


a= 4/a 


Now, by Wall [5] equation (92.9), 


, f 41 rs a 
' — —a Z2-—-a ; os—- a > 
[ é v d = ¢ a ‘ 


Js X+ 14+ X+ 14+ X+ 14+ X+ J 
for all a if X > O. On substituting and simplifying it is then found that for 
r> —2/a 
. li-@il2zg—-a2azs3-a 3 
l/p(zt) =~ a XS ; —— > 
: X+ 1+ X+ 14+ X+ 14+ X+ 


The approximants to the continued fraction on the righthand side lead to ap- 
I £ 
proximations to u(x). The first seven of these are 


' 2r tT a ’ 4(r — 
4, (x) = 2/a, pe (xr) = — . uz (zt) = = 
ar +2 Zar +a’ +2 
) 2(27° + 4ar + a + 2) 
uy (x) = ee 
. (ar + 2)(27 + 3a) 
2(27° + Gar + 2+ 3a) 
Ms \t) = —= me — - 
F 2ar* + (5e° + 4)¢4 + 10a+a 
2(42° + 18ax + 6(2 + 3a)r + 3a + 14a 
Me \Z) = " : — 7 = 
(ax + 2)(42° + l6azxr + lla’* + 8) 
| &(2° + Gar + 3(3a + 1)z + 30° + 5a) 
M7 \t) = 


fax? + 2(lla? + 4)z* + 22ala + 2)z + 3a‘ + 520’ + 16° 


It should be noted that u(x) is Des Raj’s lower bound for u(x). By elementary 
algebra it can be shown that ys(x) exceeds ye(r) for all relevant a and z; and 





A. V. BOYD 


TABLE I 


Values of u(x) when a 








ee 


BSEsE 


So 
oO 
ee 


-e-wWwN Ne 


8% 


TABLE II 
Values of u-(x) when a = 16/9, a 








a\x) Ma) wa\x 


z 
8 


0.400 0.800 | 0.842 
0.750 0.944 | 0.960 
0.909 1.024 1.032 
1.000 1.076 1.081 
059 1.114 1.116 
100 1.141 1.143 
131 1.162 1.163 

1 1. 

1 1. 

1 1. 


| 


.154 .180 180 
.172 .193 194 
.188 205 206 


50 
.00 
50 
.00 
.50 
.00 
.50 
-00 
.50 
.00 


1 
1 
2 
2 
3 
3 
4 


Further, for z = 0, ws = 0.9523, we = 0.9504, and w; = 0.9515 


that, for all relevant a and for x > max (0, 2/a — 2a), us(x) is less than Des 
Raj’s upper bound. 


3. Convergence of the approximants for integral a. We suppose henceforth 
that z > —a/2. (All the inequalities to be derived appear to hold over at least 
part of the range —2/a S x S —a/2, but as we are interested only in large 
positive z we shall not worry to extend their range of validity.) If a = n then 
X+i-—a= (2x + ia)/a > Ofori = 1, 2, 3,--- and 


=9 
f . . 
) <0 forz = lton — l, 
;-—<e » ° 
= 0 fort = n. 


Hence, by considering the approximants to 
1 l-—a 


Fe ts. a 
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it is easily verified that u(x), u(x), --- , wens(x) satisfy the inequalities 
Me < os < oe Spr Sees pms < pe < ws < ws < ws <r -- 


ion—1(Z) is of course equal to u(x) since the (2n)th partial numerator of the con- 
tinued fraction vanishes. The rapidity of the convergence of the sequence u,(z) 
in the case a = 4 is indicated by Table I, where Des Raj’s numerical bounds 
(3! are included for comparison. 


4. Convergence of the approximants for non-integral a. If n<a<n+l 
then X +i—a> Ofori = 1, 2,--- and 


) <0 fori = l ton, 
\>0 fort =n+1,n+2,--- 


so that u:(z), --~ , uen(x) satisfy the same inequalities as in the case of integral 
a, while pani(Z), ponci(Z), Menaa(Z), --- ANG pon(T), pense(Z), menas(Z), ~~ form 
monotonic sequences approaching u(x), one from above and the other from 
below. Thus if 2r — 1 < a < 2r then we have 


He < ws < Me S or Se? < Bete < Myo < Mer < Mears 
< Mtr-2 < Mer < Merge? < °° <p < 9+ < fea < Me < Mar-3 
< Mos < Marz < Mars < °° < pe < ws < Ms < Me < ps 
and if 2r < a < 2r + 1 then 
Me < we < Me < oz S*** < pars < Marg < Mare < Mer 


< Marat < Mar43 qj *°° <4 q eee < Mar+s < Maree < Mer < Mir-3 
< Mtr << Mart < Mars S*°* Se < Ms < ws < me < - 


Table II indicates the rapidity of the convergence of yu, in the case a = 16/9. 
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ON THE COMMUTATIVITY OF OPERATORS IN STOCHASTIC MODELS 
FOR LEARNING! 


By MANFRED KocHEN 
Harvard University? 


. 3 = 
Introduction. Bush and Mosteller’ have shown that a very fruitful model for 
the analysis of certain experiments on Learning in animals can be developed in 
terms of linear operators, Q, which are defined as follows: 


Qp=ap+(l-—a)ar OSpsi1, OSQE 


The probability (measured as the relative frequency over a number of sup- 
posedly identical animals) that an animal makes a certain one of two possible 
responses on the kth trial is denoted by p; , to be substituted for p in the above 
equation. The two alternatives might be going to the right and to the left in a 
T-maze, and p might be the probability of going to the right. The variable 
Qa represents the probability that the animal makes the proper response 
(e.g. going to the right) on the k + Ist trial after the occurrence of the ith of 
several possible events. It is often sufficient to consider only two events, FZ, and 
E, (e.g. reward and punishment) and their associated operators Q,; and Q. . The 
learning process is assumed to be described by the following recursive (Markov- 
type) relation: 


Pes = Om = opt+(l—a);s OfSms1, k =0,1,2,--- 
05Qms1 ‘ k = 0,1,2,--- 


after event E; has occurred. The parameters a; , A; 7 = 1, 2 are to be statistically 
estimated in order to obtain a good fit between computed and observed data. 
If, for instance, the sequence of events E,E,E,E, were to occur, then p, = 
Q:Q:90:Q:p0 . The estimation of a; , a2, A1, Ax , from even this 4-trial experiment 
presents considerable technical difficulties. If it were known, however, that the 
two operators commute, then p, = QiQipo , which simplifies the estimation prob- 
lem considerably. If the operators do not commute, and nothing appears to indi- 
cate that they do in general, it might be inquired if there is not some function 
of px into f(px) such that the induced operators on f(p,) will commute. 
Results. Consider the closed unit interval [0, 1], and let p be any point in it. 
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STOCHASTIC MODELS 
From the restriction that 0 S Q < 1, it is easily deduced* that 0 S dx 
and 


Max — S a; = 1, i = 1,2. 
k eS 


Let f be a continuous function on [0, 1). Suppose that the operator Q; on p in- 
duces a transformation T; on f(p) such that 


f(Qp) = Tf(p) for every pe (0, 1). 


The question arises whether there exists an f with the above properties and such 
that 


T:T2f(p) = T2T:f(p) for all p « (0, 1) 


regardless of whether Q,Q.p = Q.Q:p. The following result answers this question: 
TueoreMm. 7;T2f(p) = T:Tif(p) if and only if f is a periodic function with 
period (1 — a;) (1 — ae) (Ay — Ao). 
PROOF. 
(a) Suppose that 


TiT:f(p) = T2Tif(p). 
Then 
(T:T; — T:T1)f(p) = 90. 
Observe that 
Ti:T:f(p) = Tif(Qp) = f(Q:Q:p), 
so that 


(T:T: — T2T;)f(p) = f(Q:Qp) — f(Q:Q.p) = 0. 


Q:Q:p = alasp + (1 — a2)Ae] + (1 — a)Ar = ap + 3B, 
where 
a = ama, and b = a;(1 — az) + (1 — ay)Ay 
and 
Q.Q0:p = alaxp + (1 — a)Ay] + (1 — ae)Ae = ap + BD’ 
where 


>’ = ae 1 — a)rX\i + (1 — a2)A2. 


*R.R. Bush, F. Mosteller and G. L. Thompson, ‘‘A Formal Structure for Multiple-Choice 
Situations’’, Decision Processes, Eds. Thrall, Coombs and Davis, J. Wiley and Sons, N. Y.., 
1954, Ch. VIII. 
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Hence 

f(ap + b) — flap + b’) = 0 for all p « (0, 1). 
Let 

gq = ap + bso that f(g) = f(¢ + (6 — b’)). 


This defines a periodic function with period 


& b ys b’ = ay(1 = ae) + (1 = a)Ay “— ae(1 —- a)Ay — (1 = ate) Az 
(1 — a)(Qr — ads) + (1 — az)(arhe — Az) 
(1 ne a@)A(1 — Qe) + (1 ” ae) A2(ay =" 1) 


(1 as a,)(1 —- aa) (Ax - Az). 


Il 


I 


(b) Now suppose that f(p) = f(p + xu) for all p « [0, 1] and some uz. Then 
F(Q:Qep) — f(Q2Qip) = 0 only if (Q:Q. — Q.Q;)p = ku, k = 0,1, 2---. But 


(Q:Q2 — Q2Q:)p = (1 — as)(1 — aa)(Ar — Az) = ep. 


Letting k = 1, uw has the same value as above, and (7:7; — 727;)f(p) = 0. 
QED. All the equal signs should be understood as identities. 

Corotuary 1. Jf Q; and Q2 commute, then » = 0. This clearly occurs if and 
only if: a, = lorag=lorum =>. 

Corotuary 2. /f 0 S$ a; S 1 andO S \& S 1 then |p| S 1 withy = 1 Gf 
a, = a = Oorry = 0, r»» = lorry = 1, rx = 0. 

Suppose that Q; and Q, do not commute. It is then desirable that f can trans- 
form po such that 


Q:Qepo = f°T,T f(po) = f'T.T if (po). 


Clearly, since f is periodic, it will not have a single-valued inverse. However, 
if bounds on Q,Qsp. are known, A S Q,Qsp0 S B,such that B —A 3S y/2, it may 
be possible to recover p2 = Q:Qspo. For experiments in which the probability 
of one response becomes eventually very high and that of the other very low 

A — 2 | & 1. If, in addition, the experiment is such that the event F£, has the 
same effect on one response as the event E, has on the other, a; may be taken 
equal to a2. Call the common value a. Finally, if it can be estimated that a 
does not exceed some number C (e.g. 1/2) then »/2 = (1 — C)’/2. This bound 
is largest when C ~ 0, and this implies that u ~ 1, by the above corollary. In 
this case f may have a single-valued inverse. In general, to have a single-valued 
inverse f gught to be monotonic inside [A, B] provided that 


Asma=zB k = 0, 1, 2, --- 


For instance, if » = 1/2and f(p) = sin (2x/1/2)p,and 7/8 S pm 1, k = 0,1, 
2, --- then f(p:) has a single-valued inverse, and the commutativity of T; and 


T. can be utilized. 
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General Remarks. Consider the case where there are r instead of 2 response 
classes. Then it is convenient to regard the r probabilities p; , --- , p, as a nor- 
malized column vector, p. With ¢ possible events, there are ¢ corresponding linear 
operators, which can be represented by tr X r stochastic matrices, M,,--- , 
M;,--:,M,. Then, the value of the vector p at the k + Ist trial, after the 
occurrence of event EF; , is given by M;p, where p; is the value of the vector at 
the kth trial. Under the assumption of combining classes, T; may be written as 
M; = ad + (1 — a;)A; where J is the r X r identity matrix, and A; isanr X r 
matrix in which all columns are identical, and the r entries are denoted by 
hi’, --* , A>”. It is then readily shown that the commutator of M; and M; is 
the vector: » = (1 — a;)(1 — a;)(A; — A;)*. The last term (A; — A,;)* is any 
of the r identical column vectors of the matrix (A; — A,). It is now necessary 
to find f such that f(p) = T.f(p) and such that 7:7 f(p) = T;T f(p), where 
f(p) denotes the column vector with elements f(p,), --- , f(p,). The theorem goes 
through as before, these conditions being satisfied if and only if f is periodic with 
f(b) = f(p + w), where w is the commutator vector defined above. The determi- 
nation of conditions under which f has an inverse is a somewhat deeper question. 
For the present, it is sufficient to remark that if the gth component of p; is 
bounded by A, and B, for some g S$ r and f is monotone in [A, , B,], then f 
has an inverse in that region, and the values of this gth component on successive 
trials can be used to estimate the parameters. 

Returning to the case of r = 2 and ¢ = 2, it appears that for a given Q, and 
Q. half the commutator »/2, gives a measure of the largest set of values of p on 
which it is possible to find a 1-1 mapping f such that the induced transforma- 
tions T; and T; commute. At the same time, » also gives a measure of the fraction 
of the interval [0, 1] on which the commutativity of Q; and Q, fails to hold. 


REFERENCE 
1] Stochastic Models for Learning, John Wiley and Sons, New York, 1955. 
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ADDENDA TO “INTRA BLOCK ANALYSIS FOR FACTORIALS IN TWO- 
ASSOCIATE CLASS GROUP DIVISIBLE DESIGNS”: 


By Rautepw ALLAN BRADLEY AND CLypE YounG KRAMER 
Virginia Polytechnic Institute 


1. Nair and Rao [1] in a very fundamental paper discussed confounding in 
asymmetrical (asymmetrical in the factor levels) factorial experiments. They 
gave a general formulation of the combinatorial set-up for balanced confounded 
designs, assuming their existence, of asymmetrical factorial experiments and 

Received April 7, 1958. 


1 Research sponsored by the Statistics Branch, Office of Naval Research. Reproduction 
in whole or in part is permitted for any purpose of the United States Government. 
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showed how to construct some optimum designs for two-factor experiments 
with some extensions to three and four factors. 

Requirements for balanced confounded designs of asymmetrical factorials were 
set forth. Using their notation, we let (7; , --- , 7m) be the treatment combina- 
tion with the 7z,th level of factor F; ,/ = 1, --- , m, F; having s, levels. There are 
v= IL. 8, treatment combinations to be arranged in b blocks of k experimental 
units with no treatment combination on two units of the same block. Require- 
ments for balanced confounding were: 

(i) Every treatment combination is replicated r times. 

(ii) The treatments (7%; , --- , 7m) and (ji, --- , jm) Occur together in Ax, 
blocks where k, = 0 or 1 as % = j; or 1: ¥ je. 

Nair and Rao discussed two-factor experiments in detail showing the estima- 
tion of treatment differences, efficiency and amount of information, and tests of 
significance. 


2. Nair [2] in a short paper in 1953 showed that the earlier work of Bose and 
Connor [3] on group divisible, partially balanced, incomplete block designs 
with two associate classes could be regarded as a special case of the analysis 
for confounded asymmetrical factorial experiments with two factors. Also, he 
showed that designs constructed by Nair and Rao correspond to designs of the 
semi-regular class of group divisible designs typed by Bose and Shimamoto [4]. 


3. Kramer and Bradley [5], using group divisible designs catalogued by Bose, 
Clatworthy, and Shrikhande [6], showed how factorial treatment combinations 
may be used in these designs and presented the straight-forward least squares 
derivation of the intra-block analysis for such experiments. This essentially 
completes the cycle. The discussion of confounding in asymmetrical factorials 
is the most general of the papers; the factors could be regarded as pseudo-factors 
to derive the analysis for non-factorial treatments in the two-associate class 
group divisible designs. Finally, the treatments in the group divisible designs 
were replaced by factorial treatment combinations to produce confounded 
asymmetrical factorials. 


4. Analyses for the basic two-factor factorial in [5] could have been based on 
the work of Nair and Rao [1] and Nair [2]. The association of notation (the 
Bradley-Kramer notation followed by that of Nair and Rao), where notations 
differed, is as follows: 


M, 82 5M, 8 3 Ar, As*o 3 Av, Aor = And; (Ay + rk — r)/k, pu = Pr. ; 
mnr2/k, pr ; Qi; , O(4, 7); 
ti; , t(t, 7); A-factor, F2-factor; and C-factor, F;-factor. 


The association of notations leads to equivalences of results. In the order as 
before, Table 1 corresponds to Table 2, variances of effects in (5.22) and (5.23) 
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with (3.23) and (3.22), and efficiencies (5.27), (5.28), and (5.29) with those 
indicated on the bottom of page 113 of [1]. 


5. We are indebted to K. R. Nair for drawing these matters to our attention. 


REFERENCES 
(1) K. R. Narr anv C. R. Rao, “Confounding in asymmetrical factorial experiments,” 
J. Roy. Stat. Soc. B, Vol. 10 (1948), pp. 109-131. 
K. R. Narr, “‘A note on group divisible incomplete block designs,’’ Calcutta Stat. Assoc. 
Bull., Vol. 5 (1953), pp. 30-35. 
| R. C. Bose anp W. 8. Connor, ‘“‘Combinatorial properties of group divisible incomplete 
block designs,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 
[4] R. C. Bose anv T. Suimamoro, “Classification and analysis of partially balanced 
designs with two associate classes,’’ J. Amer. Stat. Assn., Vol. 47 (1952), pp. 151-190. 
(5) C. Y. Kramer anv R. A. Braptey, “Intra-block analysis for factorials in two-associate 
class group divisible designs,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 349-361. 
(6) R. C. Bose, W. H. Ciatworrny, anv 8. 8. Surrxnanpe, “‘Tables- of partially balanced 
designs with two associate classes,’’ Tech. Bull. No. 107 (1954), North Carolina Agri- 
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ACKNOWLEDGMENT OF PRIORITY 


By Joun S. Wuirte 


It has been called to my attention that the results in my note ‘A t-test for 
the serial correlation coefficient’ (Ann. Math. Stat., Dec. 1957) duplicate re- 
sults obtained by M. H. Quenouille in ‘Approximate tests of correlation in 
times-series 3’ (Proc. Cambridge Phil. Soc., Vol. 45, part 3, 1949). I wish to 
acknowledge the priority of Prof. Quenouille’s results which were inadvertently 
overlooked. 


a RR 


CORRECTION TO “ON THE POWER OF CERTAIN TESTS FOR 
INDEPENDENCE IN BIVARIATE POPULATIONS” 


By H. S. Konun 


. 304, line 13: like the left-hand side, the right-hand side is a function of n*. 

. 305: beginning with the word ‘“‘exists’’ Theorem 1.2 should read the same 
as Theorem 1.1, except that the exponent changes from 1/h to 1/hp*. 

. 306, line 1: change “‘of’’ to “‘at’’. 

. 309, line 3: insert “‘if p exists,’’ preceding the expression for ER, . 

. 309, last line of section 1: for ER, = 0 read ER, — 0. 

. 309, line 8 of section 2: change “‘consist merely of’’ to “contain’’, and “‘or’’ 
to “‘plus’’. 

. 309, line 3 from below: change A to A — {)’}. 
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p. 310, line 1: change “‘is independent’’ to “is the distribution of two inde- 
pendent random variables’’. 


rr 


CORRECTION TO “THE WAGR SEQUENTIAL T-TEST REACHES A 
DECISION WITH PROBABILITY ONE” 


By Herspert T. Davin anp Wiiuiam H. KruskKau 


Two corrections to the paper of the above title (Ann. Math. Stat. Vol. 27 (1956), pp 
797-805) should be made. 


(1) Page 803, line after (4.2): Kx/1 + K? should be replaced by K, V/1 + R?. 
(2) Page 804, line 4: v,.(A, — R,) should be replaced by +~/n (A, — R,). 


————————— 


ABSTRACTS OF PAPERS 


(Abstracts of papers presented for the Ames, Iowa Meeting of the Institute, April 3-6, 1958 


41. Similar Tests of Hypotheses Concerning the Ratio of Mean to Standard 
Deviation in a Normal Population. Ropert A. Wissman, University of 
Illinois. 


Let X,, --- , Xw be independent N (u, o*) variables, and consider the hypothesis that 
u/o equals a given value against various alternatives. Let 


T, =>Xi,7T: = VNX, T = (7;,7:),7r = VNu/e 


Then the density of T is c(c, r)h(t) exp [—(t:/2c*) + (r/c)tk] with A(D) = (4h —- t3)*/2-1 if 
t; = t and h = 0 otherwise (we have put n = N — 1). Let the hypothesis be r = ro . Associ- 
ated with the exponential is a differential operator D = 4*/at; — 2ri/dt; . For a certain 
class C of functions G of t the test function a + ¢(t) with ¢ = A~'DG will be similar and 
of size a. Conversely, to any similar test function a + ¢(t) there corresponds a Ge C, ob- 
tained by considering the differential equation DG = hé as a heat (or diffusion) problem 
in one dimension, with a heat source density hé which is a function of both position (t, 
and time (4), and solving the equation with help of the usual Green’s function for the heat 
equation. Some of the unsolved problems concerning the search for an optimum similar 
test are indicated. (Rec. April 3, 1958) 


(Abstracts of papers presented for the Los Angeles Meeting of 
the Institute, December 27-28, 1957. 


42. Demand for and Allocation of Engineering Personnel. I. Estimation of the 
Demand for Engineering Personnel, and General Formulation of the 
Allocation Problem. Rasenpra KasHyap 


Historical data for manpower and costs are analyzed for several types of contracts 
(prototype, initial, and follow-on contracts) with special regard to routines for (1) dis 
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section of multiphase distributions with overlapping significant phases; (2) determination 
of standard patterns for incremental and cumulative manpower and costs; (3) estimation 
of total manpower and costs. As to (1), graphical procedures may be useful (Gibrat, Daeves, 
etc.). For (2), the Pearson curve types may be applied, or the Edgeworth-Kapteyn system, 
which is closely related to the application of Hermitian polynomials, a method that for 
several reasons may deserve preference above all competing devices. (3) is a typical regres- 
sion problem, the affinity and the effectivity of the chosen approach to be checked by 
Fisher’s and Student’s tests respectively. The problem of allocation of engineering personnel 
involves the determination of an optimal scheme for the allocation of available personnel 
to meet the demand for these personnel by the engineering units. This allocation has to be 
satisfactory under surplus as well as under shortage conditions. The simple consideration 
of manpower transfer to alternative fields of engineering activities shows clearly that 
optimization is necessarily an overall group problem. It can be described by an objective 
function considering competitive ability ratings in various fields, under the aspect of 
some suitable optimality criterion concerning costs, output or parametric quality-level. 
Thus the complex problem is formally reduced to one in linear programming. (Received 
March 14, 1958.) 


43. Demand for and Allocation of Engineering Personnel. II. Integral-Valued 
Solutions of Allocation Problems. Herman W. Von GuERARD 


Analysis of proportional representation, allocation or elimination of units is bound to 
integral-valued solutions. In consequence, proportionality, in general, can be approached 
only, and that leads to a problem of optimization. Unfortunately, that does not provide 
by itself the criterion for the least deviation from proportionality. Rounding procedures, 
in general, are not satisfactory. The main issue is, in terms of political elections, that no 
party is presumed to score less by the only reason that the total number of seats has been 
increased (postulate of monotony). Other criteria, based on least squares or on minimizing 
Gram’s determinant (i.e. maximizing linear dependence), are subject to the same con- 
siderations. The best expedient may be seen in requiring maximum likelihood to straight 
proportionality, and that is equivalent to sampling with replacement (the homogeneous 
case). The still more important procedure of sampling without replacement leads to 
d’Hondt’s scheme (the inhomogeneous case), which is equivalent to maximum likelihood 
after adding one unit to each of the initial frequencies, i.e. to the popular votes per party 
Most of the related theorems can be easily visualized by multidimensional geometry of 
numbers (Minkowski), where d’Hondt’s method of successive divisions is represented by 
successive penetrations of a vector through hyperplanes. (Received March 14, 1958.) 


Abstracts of papers presented for the Cambridge, Massachusetis 
Meeting of the Institute, August 25-30, 1958.) 


14. On the Asymptotic Minimax Character of the Sample d.f. of Vector Chance 
Variables. J. Krzrer anp J. Woirowrrtz, Cornell University. (By title) 


Let F (resp., F*) denote the class of all d.f.’s (resp., continuous d.f.’s) on Euclidean 


m-space Rm. Let X,,+-- , X, be independent chance m-vectors with common unknown 
d.f. F. The space D of decisions (values of the estimate of F) is any space of real functions 
d on R™ which includes all possible realizations of the sample d.f. S, of Xi, --- , Xa. Let 


. 


¢@% be the decision function which always makes decision S,. Dvoretzky, Kiefer and 
Wolfowitz showed in Ann. Math. Stat., 1956, that, when m = 1,¢% is asymptotically minimax 
(as n — «) for estimating F in F or S$, for any of a wide class of loss functions. In the 
present paper analogous results are proved when m > 1, despite the fact that S, no longer 
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has the distribution-free property it has when m = 1. The resulting nonconstancy of the 
risk function r(F, ¢%) for F in S and even the simplest loss functions, presents new diffi- 
culties in the minimax proof when m > 1: for example, the method of proof necessitates 
showing that r(F, ¢%) approaches a limit as n — «, uniformly for F in an appropriately 
dense subset of $*; the authors’ results in Trans. Amer. Math. Soc., 1958, are used in proving 
this. (Received March 21, 1958.) 


45. Optimum Designs in Regression Problems. J. Krerer anp J. WoLFrow!Tz, 
Cornell University. (By title) 


Suppose Y,; ,i = 1, --- , n, are independent random variables with EY, = Di a,f;(z) 
for z ¢ X, where the /; are known and the a; are the unknown regression coefficients; 
Var (Y.) = v(z)o*, where v is known. We consider the optimum allocation of the z; for 
problems of statistical are (1) about a; , (2) about the s parameters a,_,4; , --- , @ 
(3) about the whole function > a;f; . Algorithms are obtained which facilitate the com- 
putation of optimum designs (for several different optimality criteria, in the case of (2)). 
Examples are given which show the great simplification to be achieved by the use of these 
algorithms, over a more direct approach. For example, in case (1) the problem is solved by 
finding the best Chebyshev approximation to f; of the form Dt" c,f; and locating the 
z;, with appropriate frequencies, at points of maximum absolute deviation of the best 
approximation from /; ; in the example X = [—1, 1], f;(z) = 27", k = h + 1, the optimum 
design locates a fraction 1/h of the observations at each of —1 and 1 and a fraction 1/2h 
of the observations at cos (jxr/h), 1 S 7 S h — 1, and, as h increases, the relative effi- 


ciency of the often used “‘equal spacing” designs tends rapidly to zero. (Received April 
17, 1958.) 


46. Uniqueness of the L. Association Scheme. 8. 8. SurikHanpe, University 
of North Carolina. 


A partially balanced incomplete block design with v = s* treatments is said to have L» 
association scheme (R. C. Bose and T. Shimamoto, Journal of the American Statistical 
Association, 47: 151-184, 1952), if the treatments can be arranged in an s X s square such 
that any two treatments in the same row or the same column are l-associates, whereas all 
the other pairs are 2-associates. In this case it is easily seen that n; = 2s — 2,nm. = (8 — 1)?, 
pu = 8s — 1, pix = 2, where the symbols have the usual meanings. It is now proved that 
for a P.B.I.B. with s? treatments with the above values for m: , nz, pi: and pi: , the associ 
ation scheme is of Lz type for all s = 3 excepting s = 4. It can be shown that a necessary 
condition for existence of a symmetrical P.B.I.B. with above parameters, when s is even, 
is that r — 2A; + A. must be a perfect square and further (r — A; + (s — 1)(. — Az), —1) = 1 
for every odd prime p, where the last symbol stands for the Hilbert norm-residue symbol. 
on result contained in the last sentence, can also be obtained from a paper submitted by 
M. N. Vartak to the Annals of Mathematical Statistics. Here r, ; , \2 have the usual mean- 
ing. gi Paetsch May 26, 1958.) 


47. On the existence of Wald’s sequential test. Ropert A. Wissman, Univer- 
sity of Illinois. 


In the literature on Wald’s sequential probability ratio test the question of existence of 
stopping bounds, given the two error probabilities, has never been answered. Granted 
existence, the uniqueness has been shown by L. Weiss (Ann. Math. Stat. Vol. 27 (1956) pp. 
1178-1181) in the case that the probability ratio is continuous. Let a , a2 , be the two error 
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probabilities, and let a = (a; , az). In the case of continuous probability ratio, and in the 
discrete case with suitable randomization, a; and a: are continuous functions of the stopping 
bounds. Let C be the non-increasing (and convex) curve of points a produced by coincident 
stopping bounds, and let A be the set in the a-plane bounded by C and the coordinate axes 
Consider a point (a; , a3) on C, and separate the stopping bounds in a way which keeps a 
constant. Since az is a continuous function of the separation d between the bounds, with 
a2(0) = a: , a2(%) = 0, every value a: between 0 and a; is assumed for some d. It follows 
that for every a in A there exist stopping bounds. In the continuous case it is known from 
Weiss’ work that a2 decreases monotonically from a: to 0, as d increases from 0 to «. In 
that case, for the existence of stopping bounds it is also necessary that a e A. (Received 
August 16, 1957; revised June 16, 1958. 
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NEWS AND NOTICES 


Readers are invited to submit to the Secretary of The Institute news items of interest 
Personal Items 


Gertrude Mary Cox, director of the Institute of Statistics, Consolidated Uni- 
versity of North Carolina, was awarded an honorary Doctor of Science degree 
by Iowa State College during its Founder’s Day centennial observance; she was 
cited as ‘‘teacher, researcher, leader and administrator in the field of statistics.”’ 

George Waddel Snedecor, who was primarily responsible for the development 
of the Iowa State College Statistical Laboratory, was awarded an honorary 


Doctor of Science degree by the college during its Founder’s Day centennial 
observance and cited as “teacher, author, pioneer in experimental statistics.” 
He has been a visiting professor at North Carolina State College, in the Insti- 
tute of Statistics, since 1957. 


Allan G. Anderson has resigned his position as Chief Statistician at the General 
Tire & Rubber Company, Akron, Ohio, to accept a position as Professor and 
Head of the Department of Mathematics at Western Kentucky State College, 
Bowling Green, Kentucky. 

Dr. Ernst P. Billeter has been appointed Professor of Statistics and Automa- 
tion at the University of Fribourg (Switzerland). He has also been elected 
Director of the Institute for Research in Automation, which has recently been 
founded at this University. The aim of this Institute is to do basic research 
work in application of automation in business and to introduce businessmen and 
their staff members, as well as students in economics, into the general methods 
of programming electronic data processing machines. Furthermore, this Insti- 
tute will help businessmen in solving their problems in operations research, 
market research, and statistical quality control. 

Dr. Uttam Chand has a new position as Officer on Special Duty (Training) 
in the Central Statistical Organisation (Cabinet Sectt.), New Delhi, India. 

Dr. Frank A. Haight, formerly of Auckland University College, New Zealand, 
has returned to the United States to become Associate Mathematician at the 
Institute of Transportation and Traffic Engineering, U. C. L. A. 
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W. Robert Hydeman has accepted an appointment as Manager of Computer 
Systems at Touche, Niven, Bailey & Smart in their Executive Offices located 
at 1292 National Bank Building, Detroit 26, Michigan. 

Richard C. Kao, formerly Research Associate, Operations Research Depart- 
ment, Engineering Research Institute, and Lecturer, Department of Mathe- 
matics, University of Michigan, Ann Arbor, is now Associate Mathematician, 
System Development Corporation, Santa Monica, California. 

Mr. Frederick G. King recently took a position as Senior Scientist with the 
Armour Research Foundation and now lives in Evanston, Illinois. He was 
formerly with the Ballistic Research Laboratories at Aberdeen Proving Ground, 
Maryland. 

1/Lt. Melville R. Klauber is now stationed with the 34ist Air Refueling 
Squadron, Dow Air Force Base, Maine. 

Richard A. Lamm, formerly at the Biological Warfare Laboratories, Fort 
Detrick, Maryland, is now a Statistician with the American Cyanamid Company 
at Pearl River, New York. 

Dr. William G. Madow has been advanced to the position of Staff Scientist 
of Stanford Research Institute, Menlo Park, California. 

William F. Taylor has left the School of Aviation Medicine, Randolph Air 
Force Base, Texas, to become Associate Professor of Public Health in the Divi- 
sion of Biostatistics of the School of Public Health, University of California at 
Berkeley. 

H. Robert van der Vaart, who has been a visiting professor at the Department 
of Experimental Statistics of the Institute of Statistics at Raleigh, North Caro- 
lina, from January, 1957, until the end of January, 1958, will be a visiting 
associate professor (and hold a scholarship from the Netherlands Organization 
for Pure Research, Z.W.O.) at the Department of Statistics of the University 
of Chicago. 

Ronald E. Walpole has completed the requirements for the Ph.D. degree in 
Statistics at Virginia Polytechnic Institute and has assumed the position of 
Head of the Department of Mathematics and Statistics at Roanoke College. 


ec 


New Members 
The following persons have been elected to membership in The Institute 
February 3, 1958, to May 13, 1958 
Agan, Miss Martha L., B.S. (Univ. of California, Los Angeles), Medical Record Librarian, 
V. A. Center, Los Angeles, California; 1814 Holmby Avenue, Los Angeles 25, Cali- 
fornia. 
Alling, David W., M.D. (Univ. of Rochester), Student, Cornell University, Ithaca, New 
York; 1124 Ellis Hollow Road, Ithaca, New York. 
Baker, Laurence H., B.S. (Iowa State College), Research Assistant, Department of Animal 
Husbandry, University of Minnesota, St. Paul, Minnesota. 
Barnard, George A., M.A. (Cambridge), Professor of Mathematical Statistics, Imperial 


College, Mathematics Department, University of London, Exhibition Road, London S.W. 
7, England. 





NEWS AND NOTICES 94} 


Berliner, Paul, M.B.A. (City College of New York), Engineer, Radio Corporation of America, 
Depart. 660, 18-8, 416 South 6th St., Harrison, New Jersey. 

Blair, Charles R., B.S. (George Washington Univ.), Mathematician, National Security 
Agency, Ft. George G. Meade, Maryland, and student, George Washington Univ., 
Washington, D. C.; 536 Beacon Road, Silver Spring, Maryland. 

Cohen, F. A., M.A. (U.C.L.A.), Teaching Assistant, University of California at Los An- 
geles, Los Angeles 24, California; 11651 Gorham Ave., #4, Los Angeles 49, California. 

Cunia, Tiberius, M.S. (McGill Univ.), Forest Engineer, Canadian International Paper Co., 
1461 Sunlife Building, Montreal, Quebec, Canada; 5562 Basile Patenaude Pl., Montreal, 
Quebec, Canada. 

Dutt, John E., M.A. (Columbia Univ.), Mathematician, MIT Lincoln Laboratory, Lexing- 
ton, Massachusetts; 56 Arlington Street, Newton, Massachusetts. 

Elashoff, Robert M., A.M. (Boston Univ.), Student and Laboratory Teacher in Biostatis- 
tics, Harvard School of Public Health, 55 Shattuck Street, Boston, Massachusetis. 

Ferrin, Kenneth M., M.A. (U.C.L.A.), student, U.C.L.A.; 1412 Midvale Avenue, West Los 
Angeles 24, California. 

Federowicz, Alexander J., B.S. (Carnegie Inst. of Tech.), Graduate Student, Carnegie 
Institute of Technology, Pittsburgh 13, Pa.; 5876 Solway Street, Pittsburgh 17, Pa. 
Fimple, Melvin D., M.B.A. (Univ. of Buffalo), Components Engineer, Stromberg-Carlson 

Company, Rochester, New York; 21 Carthage Drive, Rochester 21, New York. 

Fink, Lester H., B.S. in E.E. (Univ. of Pennsylvania), Engineer, Electrical Research Sec- 
tion, Philadelphia Electric Co., Philadelphia, Pa.; Ferry and Iron Hill Roads, Doyle- 
town R. D. 1, Pa. 

Freimer, Marshall Leonard, A.M. (Harvard Univ.), Student, Harvard University, Dept. 
of Statistics, Cambridge 38, Massachusetts; Lincoln Laboratory, P. O. Box 73, Lezing- 
ton 73, Massachusetts. 

Grossling, Bernardo F., Ph.D. (London Univ.), Senior Research Geophysicist, California 
Research Corporation, P. O. Box 446, La Habra, California. 

Howell, John Robert, M.S. (Univ. of Florida), graduate student, University of Florida, 
Gainesville, Florida; Mathematics Department, University of Florida, Gainesville, Florida 

Johnson, Jerome R., M.S. (Purdue Univ.), Chief, Rocket, Mortar & Recoilless Ammuni- 
tion Section, Surveillance Branch, Weapon Systems Lab., Ballistic Research Lab., 
Aberdeen Proving Ground, Maryland; 860 Ontario Street, Havre de Grace, Maryland. 

Kaula, William M., M.S. (Ohio State Univ.), Geodesist, U. 8. Army Map Service, Washing- 
ton 25, D. C.; 5202 Baltimore Avenue, Washington 16, D.C. 

Lerner, Gary B., B.S. (Michigan State Univ.), Actuarial Student, Metropolitan Life In- 
surance Company, 1 Madison Ave., New York, N. Y.; 731 Scranton Ave., East Rock- 
away, New York. 

Lewis, John S., B.S. (Carnegie Inst. of Tech.), Research Assistant, Department of Mathe- 
matics, Carnegie Institute of Technology, Pittsburgh 18, Pennsylvania. 

Meagher, Jack R., A.M. (Univ. of Michigan), Associate Professor, Mathematics Department, 
Western Michigan University, Kalamazoo, Michigan. 

Posener, Ludwig N., Ph.D. (Univ. of Berlin), Lecturer of Statistics and Applied Mathe- 
matics, University of Tel Aviv, 155 Herzl Street, Tel Aviv, Israel; 22 Pinsker Street, 
Rehovot, Israel. 


Raj, Des, Ph.D. (Calcutta Univ.), Associate Professor, American University of Beirut, 
Beirut, Lebanon. 

Sawits, Murray B., B.S. (City College of New York), Senior Statistician, Rayco Mfg. Co., 
22 Straight St., Paterson, New Jersey; 1420 Grand Concourse, Box 56, New York, N.Y. 

Suzuki, Yukio, Member of the Institute of Statistical Mathematics, No. 1, Azabu-Fujimi- 
Cho, Minato-Ku, Tokyo, Japan 

Thompson, Robert J., B.S. (Drake Univ.), Senior Research Engineer, Convair Pomona, 
Pomona, California; 447 Celia Avenue, Pomona, Calif. 

Zadoff, Solomon A., A.M. (Columbia Univ.), Research Engineer, Sperry Gyroscope Co., 





942 NEWS AND NOTICES 


Great Neck, New York, and student, Columbia University, New York, New York; 
193-18 37th Ave., Flushing 59, New York. 

Zayachkowski, Walter, M.A. (Univ. of Saskatchewan), Graduate Student, Dept. of Mathe- 
matics, University of Alberta, Edmonton, Alberta, Canada. 


Zimmer, William J., M.S. (Purdue Univ.), Research Fellow, Statistical Laboratory, Purdue 
University, Lafayette, Indiana. 


LE 


EXPANDED TRAINING PROGRAM IN BIOMETRICS TO BE 
OFFERED AT IOWA STATE COLLEGE STATISTICAL 
CENTER 


The Department of Statistics and the Statistical Laboratory of Iowa State 
College will substantially expand their present graduate training program in 
biostatistics with the aid of a five-year grant from the National Institutes of 
Health. This award will provide support for several graduate students in statis- 
tics per year as candidates for the M.S. or Ph.D. degree, with a view to stimu- 
lating their interest in biometry, medical statistics or public health as a career. 
It will also give partial support to one staff member so that he can devote more 
time to those areas of statistical application. 

One feature of the expanded program is that biostatistics trainees, while 
working toward masters’ or doctors’ degrees in statistics, will spend up to three 
months each year at some selected medical school or public health center to 
round out their experience through contact with biometric data in the field or 


laboratory. So far, three new traineeships have been established for the 1958-59 
year. Further details about the expanded biostatistics program and application 
forms for traineeships for the 1959-60 year may be obtained from the Depart- 
ment of Statistics, Iowa State College. 


LL 


NATIONAL REGISTER OF SCIENTIFIC AND TECHNICAL 
PERSONNEL 


The American. Mathematical Society at the request of the National Science 
Foundation is assembling and maintaining a register of mathematicians and 
mathematical scientists. The Mathematics Register is a section of the National 
Register of Scientific and Technical Personnel, which is an official responsibility 
of the NSF. The purpose of the Register is to provide up-to-date information 
on the scientific manpower resources of the United States. 

As a result of the splendid cooperation accorded to the project by most of the 
mathematicians and mathematical scientists who have received questionnaires 
to fill in, the mathematical section of the Register is now remarkably complete. 
However, there are still a few gaps to be filled in. If you have received a National 
Register questionnaire from the Society, please fill it in now and send it to the 
Headquarters Offices of the Society at 190 Hope Street, Providence 6, Rhode 
Islend. If you have never received a questionnaire and feel that you are qualified 


for inclusion in the Register, please drop a note to that effect to the Society at 
this address. 
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EDUCATIONAL TESTING SERVICE FELLOWSHIPS 


The Educational Testing Service is offering for 1959-60 its twelfth series of 
research fellowships in psychometrics leading to the Ph.D. degree at Princeton 
University. Open to men who are acceptable to the Graduate School of the 
University, the two fellowships each carry a stipend of $2,650 a year and are 
normally renewable. Fellows will be engaged in part-time research in the general 
area of psychological measurement at the offices of the Educational Testing Serv- 
ice and will, in addition, carry a normal program of studies in the Graduate 
School. 

Suitable undergraduate preparation may consist either of a major in psychol- 
ogy with supporting work in mathematics, or a major in mathematics together 
with some work in psychology. However, in choosing fellows, primary emphasis 
is given to superior scholastic attainment and research interests rather than to 
specific course preparation. 

The closing date for completing applications is January 2, 1959. Information 
and application blanks will be available about September 15 and may be obtained 
from: Director of Psychometric Fellowship Program, Educational Testing 
Service, 20 Nassau Street, Princeton, New Jersey. 


RR 


REPORT OF THE AMES, IOWA, MEETING OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


The seventy-sixth meeting of The Institute of Mathematical Statistics, a 
Central Region Meeting, was held in the Gallery of the Memorial Union on 
the campus of Iowa State College at Ames, Iowa, on April 3-5, 1958. These 
dates were within the period during which Iowa State College was observing its 
Centennial Celebration. 

A Special Invited Address, “Subjective Judgements and Statistical Practice,” 
was delivered by Professor L. J. Savage of the University of Chicago. 

On Friday evening, April 4, a banquet was held in the Great Hall of the 
Memorial Union with Professor T. A. Bancroft presiding. After dinner Dean 
tichard 8S. Bear of the Division of Science at Iowa State College addressed the 
assembled guests on the history of statistics at Ames. This was followed by 
entertainment by graduate students at Ames. 

The Chairman of the Program Committee for the meeting was Jack Silber, 
Roosevelt University. The Assistant Secretary for the meeting was Herbert T. 
David, Iowa State College. 

Ninety-six people registered for the meetings, including the following 52 mem- 
bers of The Institute: 


D. Huntsberger, Meyer Dwass, Preston C. Hammer, Emil H. Jebe, Howard L. Tay- 
lor, H. O. Hartley, W. M. Gilbert, Paul G. Homeyer, Franklin A. Graybill, W. H. Horton, 
Oscar Kempthorne, Russell N. Bradt, D. R. Truax, Stanley Isaacson, I. R. Savage, Lor- 
raine Schwartz, George Zyskind, Helen Bozivich, Robert V. Hogg, Leo Katz, Leonard J. 
Savage, Roger 8. McCullough, Howard L. Jones, Scott Krane, F. E. Satterthwaite, Bernard 
Ostle, Virgil S. Anderson, R. W. Kennard, Herbert T. David, Robert F. White, A. W. 
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Wortham, J. D. Hromi, Jack Silber, John F. Pauls, Timon A. Walther, Edward C. Bryaat, 
Richard L. Beatty, Richard L. Carter, Byron Brown, 8. N. Roy, Betty K. Stewart, Z 
Govindarajulu, M. B. Wilk, H. Robert van der Vaart, William H. Williams, T. A. Ban- 
croft, William J. Zimmer, R. A. Wijsman, G. Tintner, John Gurland, Sidney Addelman, 
David L. Wallace. 


The program for the meeting was as follows: 


THURSDAY, APRIL 3, 1958 
9:00 a.m. Invited Papers on the Design of Experiments 


Chairman: Vireit L. ANDERSON, Purdue University 
1. A Comparison of Designs for Exploration of Response Surfaces, Le Roy Fouxs, 
Iowa State College. 
2. The Staircase Design, F. A. Graysitt, Oklahoma State University. 


10:30 a.m. Invited Papers on the Problem of Nuisance Parameters 


Chairman: T. A. Bancrort, Iowa State College. 
1. Testing the Equality of the Means of Two Normal Populations, Joan GurRuanp, 
Iowa State College. 
2. The Behrens—Fisher Problem: A Critical Review and a Subjective Approach, Davip 
L. Wauuiace, The University of Chicago. 


2:00 p.m. Special Invited Address 


Chairman: Oscar Kempruorne, Iowa State College 
Subjective Judgements and Statistical Practice, L. J. Savaae, The University of Chi 
cago. 


4:00 p.m. Invited Paper on the Analysis of Variance 


Chairman: R. V. Hoge, University of Iowa. 
1. Multivariate Analysis of Variance under Models I and II and Mirzed Models, 8. N 
Roy, University of North Carolina and University of Minnesota ‘ 


FRIDAY, APRIL 4, 1958 
9:00 a.m. Invited Papers on Statistical Problems in Econometric Theory 


Chairman: J. Sitper, Roosevelt University. 
1. A New Method for Fitting the Logistic Function, Geruarp TintNeR, Iowa State 
College. 
2. The Effects of Incomplete Specification on the Results of Estimating Procedures, 
Leontp Hurwicz, University of Minnesota. 


10:30 a.m. Contributed Papers I 


Chairman: ALBERT WortuHaAMm, Texas Instruments. 
1. Bias and Confidence in Noi-Quite Large Samples (Preliminary Report), Joun W 

T 7KEY, Princeton University (By title). 

2. On a Multivariate Gamma Distribution, P. R. Krisunairan and M. M. Rao, Uni- 
versity of Minnesota. 

3. On the Fitting of Some Contagious Distributions, 8. K. Karri and Jonn GuRLanp, 
Iowa State College. 

4. Minimal Complete Classes of Tests, D. L. BurkHoLpeER, University of Illinois 
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5. An Identity of Use in Non-Linear Least Squares, M. B. Wiix, Bell Telephone 
Laboratories 
6. Contributions to the Theory of Rank Order Statistics—The One Sample Case, I. 
Ricuarp Savace, University of Minnesota. 
. A Rule for Action Based on Percentage Changes in the Sample Mean, D. B. Owen, 
Sandia Corporation (By title). 
8. An Expression for the Cumulative Distribution Function of the Non-Central t-Dis- 
tribution, D. B. Owen, Sandia Corporation (By title). 
9. Some Formulae for the Exact Computation of Probabilities in Wilcozon’s Two Sample 
Test, H. RoBERT VAN DER Vaart, University of Chicago. 


2:00 p.m. Invited Papers on Non-Parametric Statistics 


Chairman: B. Ostize, Sandia Corporation. 
1. Some Null Rank Distributions Derivable by Reflection, H. T. Davin, lowa State 
College. 


2. Order Statistics in the Poisson Process, Meyer Dwass, Northwestern University. 


3:30 p.m. Invited Papers on the Use of Electronic Computers in Statistics 
Chairman: M. B. Wiik, Bell Telephone Laboratories. 
1. Theoretical Possibilities of Computers, P. C. Hammer, University of Wisconsin 


2. Linear Programming on the IBM-650, H. O. Harter, Iowa State College. 


SATURDAY, APRIL 5, 1958 
9:00 a.m. Contributed Papers II 


Chairman: W. H. Horton, Westinghouse Electric Company. 

1. Biases in Prediction by Regression for Certain Incompletely Specified Models, Har- 
oLp Larson, Iowa State College. 

2. Notes on the Spearman-Karber Procedures in Bioassay (Preliminary 
Byron W. Brown, Jr., Louisiana State University. 

3. Approximate Solutions for the Probability Density of Zero-Crossing Intervals in a 
Gaussian Process, J. A. McFappen, Naval Ordnance Laboratory and Purdue 
University (introduced by Jupan RosENBLatTr). 

4. The Fourth Product-Moment of a Binary Random Process, J. A. McFappen, Purdue 

University (introduced by Jupan Rosensiatr). (By title) 

Limiting Distributions of k-Sample Test Criteria of Kolmogorov-Smirnov-v. Mises 
Type, J. Kizrer, Cornell University. (By title) 

6. Independence of Statistics and Characterization of the Multivariate Normal Distri- 


bution, 8. G. Guurre, University of Chicago, and Ineram OLKIN, Michigan State 
University. 


teport), 


o 


7. Unbiased Regression Estimators, Wit11am H. Wiiu1aMms, Iowa State College 

Mazimum Likelihood Estimation from Incomplete Data for Continuous Distribution, 

Scorr Kranz, Iowa State College 

. Unbiased Ratio Estimators in Stratified Sampling, Jose Nieto, Iowa State College. 

10. Similar Tests of Hypothesis Concerning the Ratio of Mean to Standard Deviation in 
a Normal Population, Ropert A. Wissman, University of Illinois. 

11. Births and Deaths in Parallel, J. Si.per, Roosevelt University. 


10:30 a.m. Invited Papers on the Theory of Estimation 


Chairman: R. N. Brant, Uuniversity of Kansas. 
1. Some Interval Estimation Problems, Ropert J. BuEHLER, Iowa State College. 
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2. Inadmissibie Samples and Confidence Limits, Howarp L. Jones, Illinois Bell 
Telephone Company. 


On Saturday, April 5, at 2:00 p.m. Dr. 8. N. Roy of the University of North 
Carolina and the University of Minnesota presented a special seminar for mem- 
bers of the Statistical Laboratory at Iowa State College on “Some Recent Work 
on Univariate and Multivariate Components Analysis.” The people attending 
the meetings of The Institute were invited to this seminar. 


EE EE 


PUBLICATIONS RECEIVED 


Arrow, Kennets J., Samvet Karin, anp Hersert Scarr, Studies in the Mathematical 
Theory of Inventory and Production, Stanford University Press, Stanford, California 
x + 340 pp. 

A Comparative Study of Statistical Analysis and Other Methods of Computing Ore Reserves 
United States Department of the Interior Bureau of Mines, Washington 25, D. C 
Supplementary List of Publications of the National Bureau of Standards, July 1, 1947, to 
June 30, 1957. Supplement to National Bureau of Standards Circular 460. (Supersedes 
Supplement to Circular 460, December 30, 1952.) Issued May 14, 1958, 373 pages, $1.50 
(Order from Superintendent of Documents, U. 8. Government Printing Office, Wash- 

ington 25, D. C.) 
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